You are on page 1of 261

DIGITAL VIDEO PROCESSING

Department of Electrical Engineering, Hopeman 413 University of Rochester, Rochester, New York 14627 Ph: (716) 275-3774 FAX: (716) 473-0486 E-mail: tekalp@ee.rochester.edu The fundamentals of digital video representation, ltering and compression, including popular algorithms for 2-D and 3-D motion estimation, object tracking, frame rate conversion, deinterlacing, image enhancement, and the emerging international standards for image and video compression, with such applications as digital TV, web-based multimedia, videoconferencing, videophone and mobile image communications. Also included are more advanced image compression techniques such as entropy coding, subband coding and object-based coding.

EE 449 Spring 1997 A. Murat Tekalp

PART 1: REPRESENTATION

Lecture 1 Introduction to Analog and Digital Video Lecture 2 Time-Varying Image Formation Models Lecture 3 Spatio-Temporal Sampling Lecture 4 Sampling Structure Conversion Lecture 5 Optical Flow Methods Lecture 6 Block-Based Methods Lecture 7 Pel Recursive Methods Lecture 8 Bayesian Methods Lecture 9 Parametric Modeling and Motion Segmentation Lecture 10 2-D Motion Tracking Lecture 11 3-D Motion and Structure Estimation Lecture 12 Stereo Video

PART 2: MOTION ANALYSIS

PART 3: FILTERING

Lecture 13 Motion-Compensated Filtering Lecture 14 Standards Conversion Lecture 15 Noise Filtering Lecture 16 Restoration Lecture 17 Superresolution 1

PART 4: STILL-IMAGE COMPRESSION

Lecture 18 Fundamentals and Lossless Coding Lecture 19 DPCM and Transform Coding Lecture 20 Still Image Compression Standards Lecture 21 Subband/Wavelet Coding and Vector Quantization

PART 5: VIDEO COMPRESSION

Lecture 22 Interframe Compression Methods Lecture 23 Frame-Based Video Compression Standards Lecture 24 Object-Based Coding and MPEG-4 Lecture 25 Digital Video Communication

Textbook:
Digital Video Processing, by A. Murat Tekalp, Prentice-Hall, 1995.

Supplementary Reading:
mentals of analog and digital video systems, including HDTV, CATV, terrestial and satellite video broadcast technologies.) Video Dialtone Technology, by Minoli, McGraw Hill, 1995. (covers digital video over ADSL, HFC, FTTC and ATM technologies, including interactive TV and video-on-demand.)

Video Engineering, by Inglis and Luther, Second Ed., McGraw Hill, 1996. (covers funda-

Grading:
Homeworks 25% Midterm Project 25% Written report due Mar. 6 Final Project 50% To be presented May 6-8 Written report due May 12

Prerequisites:
EE 446 and EE 447 or EE 241 and permission of the instructor.

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

LECTURE 1 INTRODUCTION TO DIGITAL VIDEO

1. Analog Video 2. Digital Video 3. Digital Video Standards 4. Digital Video Applications Digital TV PC Multimedia Real-time Communications

5. Digital Video Processing

c 1995-97 This material is the property of A. M. Tekalp. It is intended for use only as a teaching aid when teaching a regular semester or quarter based course at an academic institution using the textbook "Digital Video Processing" (ISBN 0-13-190075-7) by A. M. Tekalp. Any other use of this material is strictly prohibited.

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

ANALOG VIDEO

One or more analog signals that contain time-varying 2-D intensity (monochrome or color) pattern and the timing information to align the pictures. Component Analog Video (CAV) - RGB - YCrCb (YIQ or YUV) Composite Video - NTSC (National Television Standards Committee) - PAL (Phase Alternating Line) - SECAM (SEquential Color And Memory) S-Video (Y/C video) - NTSC - PAL - SECAM 2

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Scanning and Frame-Rate

Frame rate and icker: Each complete picture is called a frame (temporal sampling). Minimum frame rate required for icker-free viewing is 50 Hz. Progressive scan: Each frame is made up of lines (vertical sampling).
A C B
C B A E

Raster scanning: a) progressive scan b) interlaced scan.

Interlaced scan, where each frame is split into two elds, provides a tradeo between temporal and vertical resolution.

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

International TV Scanning Standards


Aspect Ratio 4:3 4:3 4:3 4:3 4:3 Interlace 2:1 2:1 2:1 2:1 2:1 Frames/s 29.97 25 25 25 25 Total/Active Lines 525/480 625/580 625/580 625/580 625/580 BW (MHz) 4.2 5.5 5.0 6.0 6.0

NTSC (USA,Japan,Can.,Mex.) PAL (Great Britain) PAL (Germany,Austria,Italy) PAL (China) SECAM (France,Russia)

Computer Scanning Standards


SVGA 640 480 640 480 1024 768 1280 1024 Color Mode 8bpp 24bpp 8bpp 4bpp Interlace No No No No Frames/s 60 70 70 70 Lines 525 525 800 1100 Lines/s 31,500 36,750 56,000 77,000 Data Rate (MB/s) 18.4 64.5 55.0 45.9

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Synchronization
Horizontal synch pulse 5

Scanning at the display device must be synchronized with that at the source.
Synch Black 100 75

White

12.5

Active line time 53.5

Horizontal retrace 10 t, s

NTSC video signal for one full line.

Blanking pulses are inserted during the retrace intervals to blank out retrace lines on the receiving CRT. Sync pulses are added on top of the blanking pulses to synchronize the receiver's horizontal and vertical sweep circuits. The timing of the sync pulses are di erent for interlaced and non-interlaced video.

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Resolution and Bandwidth


(F R)(N L)(HR) Video BW = 1 2
FR = Frame Rate NL = Number of Lines/Frame HR = Horizontal Resolution = fraction of time allocated to active video signal per line

Example: NTSC signal = 53.5 / 63.5 = 0.84 Video BW = 4.2 MHz Line Rate = (FR) (NL) = 29.97 525 = 15,734 2 4 :2 106 0:84 = 448 pixels HR = 15 734

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectral Content and Chrominance


v /H 2

F v /L 1

Spectrum of the scanned video signal for still images.


6 MHz sideband 4.2 MHz

1.25 picture carrier

4.83 color carrier

5.75 6 audio carrier

Spectrum of the NTSC video signal.

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Analog Video Acquisition

Electronic (CCD) video cameras - ITU-R standards 625/25 or 525/30 - recorded on video tape Motion picture cameras - 24 frames/s - recorded on motion picture lm Synthetic content - computer animation, graphics, etc. - formed by sequential ordering of a set of still-frame images

Analog Video Recording


Composite Video: VHS, U-matic Y/C Video: S-VHS CAV: Beta-cam 8

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

DIGITAL REVOLUTION

Digital data communications (e.g., computer networks, e-mail) and Digital audio (e.g., CD players, digital telephony)
What is next?

Digital video - as a form of computer data Products such as: digital TV/HDTV, videophone, multimedia PCs, will be in the marketplace soon.
1] \Digital video," IEEE Spectrum Magazine, pp. 24-30, Mar. 1992.

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

What is the bottleneck for Digital Video?

Let's look at the raw data rates for digital audio and video: CD quality digital audio High de nition video

! 44kHz sampling rate x 16bits/sample


approximately 700 kbps

! 1280 pels x 720 lines luma

640 pels x 360 lines chroma x 60 frames/s x 8 bits/pel/channel approximately 663.5 Mbps (from the GA-HDTV proposal)

A picture is worth 1000 words!!


Inglis and Luther, Video Engineering, McGraw Hill, pp. 160-178, 1996.

10

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Digital Video Studio Standards


ITU-R 601 ITU-R 601 CIF 525/60 625/50 NTSC PAL/SECAM 720 360 480 480 2:1 60 4:3 165.9 720 360 576 576 2:1 50 4:3 165.9 360 180 288 144 1:1 30 4:3 37.3

Number of active pels/line Lum (Y) Chroma (U,V) Number of active lines/pic Lum (Y) Chroma (U,V) Interlacing Temporal rate Aspect ratio Raw data rate (Mbps)

CIF: Common Intermediate Format 11

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Image/Video Compression Standards


CCITT G3/G4 JBIG JPEG H.261 H.263 H.263+ MPEG-1 MPEG-2 MPEG-4 binary images (non-adaptive) binary images still frame gray scale and color images ISDN applications (px64 kbps) PSTN applications (less than 64 kbps) low-bitrate PSTN applications (underway) optical storage media (1.5 Mbps) generic coding (4-20 Mbps) object-based functionalities (underway)

The boom in the FAX market followed binary image compression standards. 12

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Digital Video Exchange Standards


Intel Corp. Apple Computer Philips Consumer Electronics Eastman Kodak Company

DVI (Digital Video Interactive), Indeo Quicktime CD-I (Compact Disc Interactive) PhotoCD

A committee under the Society of Motion Picture and Television Engineers (SMPTE) is working to develop a universal header/descriptor that would make any digital video stream recognizable by any device. There are also digital recording standards, e.g., D1 (component video), D2 (composite video), etc. 13

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

APPLICATIONS OF DIGITAL VIDEO

Consumer/Commercial All Digital HDTV @ 20 Mbits/s over 6 Mhz taboo channels Digital TV @ 4-6 Mbits/s Multi-media, desktop video @ 1.5 Mbits/s CD-ROM or harddisk storage Videoconferencing @ 384 kbits/s using p x 64 kbits/s ISDN channels Videophone and Mobile Image Communications @ 16 kbits/s using the copper network (POTS) Other

Surveillance Imaging (military or law enforcement) Intelligent Vehicle Highway Systems and Harbor Tra c Control Medical Imaging (cine imaging) Education and Scienti c Research

14

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Digital TV

Choices for ATV broadcast channels: - terrestial broadcast - direct satellite broadcast - optical ber cable broadcast Terrestial broadcast channels are 6 MHz in US and 8 MHz in Europe. A 6 MHz channel can support about 20-30 Mbps data rate using sophisticated modulation techniques (e.g., QAM or VSB).

{ To broadcast digital HDTV over a 6-MHz channel, we need about

663.5 : 20 = 34 : 1 compression. { A single 6-MHz TV channel can support 4 or 5 standard resolution digital TV programs (at 4-6 Mbits/s each).

1]\Digital television," IEEE Spectrum, April 1995.

15

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

PC Multimedia

Early technologies

{ Compact Disc-Interactive (CD-I)

CD-based interactive full-screen, full-motion video { Digital Video Interactive (DVI) Technology Hardware to handle full motion video in PCs at about 1.5 Mbit/s. VideoCD and Digital Video Disk (DVD) Networked Multimedia / Video-on-Demand
1] \Special report: Interactive multimedia," IEEE Spectrum, pp. 22-39, Mar. 1993. 2] J. van der Meer, \The full motion system for CD-I," IEEE Trans. Cons. Electronics, vol. 38, no. 4, pp. 910-920, Nov. 1992. 3] J. Sutherland and L. Litteral, \Residential video services," IEEE Comm. Mag., pp. 37-41, July 1992.

16

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Real-Time Communications

Digital Audio: The audio signal is sampled at 8 kHz and quantized with Videoconferencing/videophone over ISDN: up to 2 Mbits/s using
H.261 or H.263 compression. H.263+ compression.

8-12 bits/sample. Most telephony networks is capable to carry a load of 14 kbps to 56 kbps. Bit rate reduction is achieved by coarser quantization.

Videophone over existing phone lines: 8 - 32 kbits/s using H.263 or Video communications over future broadband ATM/access networks: { Constant Bit Rate (CBR) channel - switched network { Variable Bit Rate (VBR) channel - quality of service contract { Available Bit Rate (ABR) channel - no guarantees, just like internet
17

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Packet Video

The video bitstream is divided into elementary blocks ( xed or variable size) each containing a header and payload (data bits), e.g., MPEG-2 packets. Packet video allows - interleaving video, audio, and data packets, and multiple programs in a single bitstream - better error protection and resilience, and low delay Network infrastructures { Telephone networks { CableTV networks { Internet (network of networks) Modes of transmission { Point-to-point transmission { Multi-casting and Broadcasting 18

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Access Networks

Fiber-to-Home Hybrid-Fiber-Coax (Cable Modem) Fiber-to-Curb (ADSL to home) Some Access Network Bit-Rate Regimes Conventional Telephone Modem ISDN (Integrated Services Digital Network) T-1 ADSL (Asymmetric Digital Subscriber Line) Cable Modem Ethernet (packet-based LAN) Fiber B-ISDN/ATM 19 28.8 kbps 64 - 144 kbps (px64) 1.5 Mbps 1.5-6 Mbps downstream 30 Mbps downstream 10 Mbps 55-200 Mbps

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Available Videoconferencing Products


Vendor BT North America GPT Video Systems Compres. Labs. NEC America PictureTel Corp. Video Telecom Name Videocodec VC2200 Videocodec VC2100 System 261 Twin chan. System 261 Universal Rembrandt II/VP VisualLink 5000 M20 VisualLink 5000 M15 System 4000 CS350 Codec speed 56 and 112 kbit/s 56 kbit/s to 2048 kbit/s 56 and 112 kbit/s 56 kbit/s to 2048 kbit/s 56 kbit/s to 2048 kbit/s 56 kbit/s to 384 kbit/s 56 kbit/s to 2048 kbit/s 56 kbit/s to 768 kbit/s 56 kbit/s to 768 kbit/s Max Frame 30 per sec 30 per sec 30 per sec 30 per sec 10 per sec 15 per sec Comp. Alg. H.261 H.261 H.261, CTX CTX Plus H.261, NEC proprietary H.261, SG3 SG2/HVQ H.261, Blue Chip Price $42,000 $42,000 $31,500 $35,000 $19,900 mono $34,950 color

20

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Available Videophone Products


Product AT&T Videophone 2500 Compression Alg. MC DCT 10 frames/s (max) British Telecom/Marconi H.261 like Relate 2000 Videophone 7.5 (3.75) frames/s COMTECH Labs. MC DCT STU-3 Secure Videophone QCIF resolution Sharevision 14.4 kbit/s MC DCT Data Rate 16.8/19.2 kbit/s 9.6/14.4 kbit/s 9.6 kbits/s Price $995 $1,275 (pair) under $1,000. $4,000 (pair)

21

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Comparison of Analog and Digital Video Systems


Digital representation is robust: Error correction minimizes the e ect of transmission/storage media distortion, noise and other degradations. Digital video can be transmitted with lower bandwidth than analog video of equivalent subjective quality by using digital compression. Digital video enables integration of networked PC multimedia, broadcast TV, and real-time communications (videophone and videoconferencing) in a uni ed system architecture. Digital video provides exibility for signal processing for enhancement, standards conversion, composition, special e ects, nonlinear editing, etc.

22

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Challenges in Digital Video Processing

(i) Motion Analysis 2-D motion/optical- ow estimation and segmentation 3-D motion, structure estimation and segmentation Object tracking, occlusion, deformations (ii) Filtering and Standards Conversion Deblurring, noise ltering, edge sharpening Frame rate conversion and deinterlacing Resolution enhancement (iii) Compression JPEG, H.261/H.263, MPEG 1-2 Subband/wavelet and model-based coding 23

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Di erences Between Still-Frame and Video Processing


Some tasks, such as motion estimation or the analysis of a time-varying scene cannot be performed on the basis of a single image. Utilization of temporal redundancies that naturally exist in an image sequence to develop e ective algorithms. - Motion-compensated ltering - Motion-compensated prediction

24

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 10 2-D MOTION TRACKING


1. Token Tracking 2. Boundary Tracking 3. Object Tracking Single-Object Tracking Multiple-Object Tracking 4. Object-Based Representation (Layering, Alpha-Plane, Mosaicing, etc.)

238

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

TOKEN TRACKING

2-D Trajectory Model: Describe temporal evolution of selected feature points, e.g.,
x1 (k + 1) = x1 (k) cos (k) ; x2 (k) sin (k) + t1 (k) x2 (k + 1) = x1 (k) sin (k) + x2 (k) cos (k) + t2 (k)

with a 2-D rotation by the angle (k) and translation by t1(k) and t2(k).
Observation Model: Determine a number of feature correspondences over multiple frames, e.g., by block matching. Batch or Recursive Estimation: Find the best motion parameters consistent with the model and observations. Batch estimators, e.g., the nonlinear least squares estimator, process the entire data record at once after all data is collected. Recursive estimators, e.g., Kalman lters, process each observation as it becomes available to update the motion parameters.

239

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Example: Tracking 2-D line segments

Each line segment is represented by a 4-D feature vector p = p1 p2]T consisting of the two end points, p1 and p2 . The 2-D trajectory of the endpoints modeled by 1 a(k ; 1)( t)2 x(k) = x(k ; 1) + v(k ; 1) t + 2 v(k) = a(k ; 1) t a(k) = a(k ; 1) where x(k), v(k), and a(k) denote the position, velocity, and acceleration of the pixel at time k, respectively (constant acceleration model). To perform tracking by a Kalman lter, we de ne the 12-dimensional state of the line segment as h iT z(k) = p(k) p _ (k) p(k) _ (k) and p(k) denote the velocity and the acceleration of the where p coordinates, respectively. 240

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Example: (cont'd)

The state propagation equation z(k) = (k k ; 1)z(k ; 1) + w(k) k = 1 : : : N where 2 3 1 2 I 4 I4 t 2 I4 ( t) 6 7 (k k ; 1) = 6 I4 t 7 4 04 I4 5 and I4 and 04 are 4 4 identity and zero matrices, respectively, w(k) is a zero-mean, white process with the covariance matrix Q(k). The observation equation y(k) = p(k) + v(k) k = 1 : : : N It is assumed that the noisy observations can be estimated from pairs of frames using some token-matching algorithm. 241

04 04

I4

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

BOUNDARY TRACKING
Polygon tracking (by tracking corners) Splines and active contours -Propagate joint points by their motion vectors -De ne various energy functions to snap the propagated snake to the contour in the next frame.

242

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

OBJECT TRACKING
Object-Based Editing ! Synthetic Trans guration Object-Based Coding ! MPEG-4 Content-Based Retrieval ! Digital Libraries 3-D Object Modeling ! Virtual Reality

243

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Triangle-Based A ne MC

Standard translational block matching cannot handle rotation and zooming. Neighboring relationships in the reference frame are preserved in the target frame. (Mesh elements do not overlap each other.)
Texture mapping

Frame k-1

Frame k

244

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

SINGLE OBJECT TRACKING


2-D mesh based region tracking (rather than token or boundary tracking) Projection of the mesh from frame to frame (no temporal dynamic model) - Mild deformations 2-D mesh design (regular, adaptive, or content-based) - Object boundaries known Closed-form solutions and fast search for node motion re nement Compensation of additive and multiplicative illumination di erences

245

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

2-D Mesh Design


Regular Mesh Simple, no need to store node locations as part of the syntax. Boundaries may not align with gray-level or motion edges. Adaptive Mesh Split-merge re nement of a regular mesh to align triangles with edges. Split instructions can be easily incorporated into the syntax. Content-Based Mesh Mesh optimized according to image content. Costly, all node locations need to be stored/transmitted.

246

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Content-Based Mesh Design

Node-point selection Delauney triangulation


The sum of DFD within each circle is the same

0000 1111 0000 1111 11111 00000 0000 1111 00000 11111 0000 1111 0000000 1111111 00000 11111 0000000 1111111 00000 11111 0000000 1111111 00000 11111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 high temporal 000000 111111 0000000 1111111 00 11 activity 000000 111111 00000 0011111 11 000000 111111 00000 11111 00 11 00000000 11111111 000000 11111 111111 00000 00000000 11111111 00000 11111 00000000 11111111 00000000 11111111 low temporal 00000 11111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 activity 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111
Marked Pixels Unmarked Pixels

247

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Node-Point Selection

Estimate 2-D forward dense motion nd and polygonize the BTBC region. Label all pixels within the BTBC polygon \marked," and include its corners in the list of node points. Compute the average DFD over the unmarked region. Compute a cost function C (x y ) over the unmarked region. Select the unmarked pixel with the highest C (x y) which is not closer to any of the existing node points by a prespeci ed distance as the next node point. Grow a region about this node point until the sum of the absolute DFD reaches a threshold. Label all points within this region as \marked." Continue until the maximum number of node points is reached, or all pixels are \marked." 248

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Node-Point Motion Estimation

Sampling from dense motion eld Logarithmic hexagonal search (Hierarchical) Closed-form connectivity-preserving solutions

{ Node-based (Polygon Matching) { Patch-based

249

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Closed-Form Polygon Matching


All N sets of a ne parameters should yield the same motion vector at the center node. A ne parameters of two neighboring patches should yield the same motion vectors along their common boundary (line segment). Given at least N + 1 correspondences within the hexagon, a linear least squares solution can be found to determine all N sets of a ne parameters. Given the spatio-temporal intensity gradients, a linear solution can be found by constrained minimization (Lagrange optimization).

250

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

An Example: 2-D Mesh Fitting

Select a polygon enclosing the region of interest Overlay a 2-D mesh (e.g., a uniform triangular mesh)

251

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Motion Estimation at the Boundary Nodes

...

..

Reference Frame

Previous Frame

Current Frame

Assumption: Mild deformations De ne a cost polygon about each boundary node Estimate the motion vector using deformable block matching 252

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Mesh Propagation and Re nement


A1
b a c

A1 A2 A2
a

A2
Previous Polygon Current Polygon

Propagate each node using the a ne mapping of the corresponding patch Use hexagonal matching to re ne the location of each node

253

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Hierarchical Mesh Re nement

254

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Tracking Intensity Variations


Intensity Model:
Ix = I R + c

scale factor c intensity o set Each node point is assigned a pair of parameters and c Values of and c at any x are bilinearly interpolated

255

' &

Digital Video Processing


Input video Select a polygon bounding the ROI

c 1995-98 Prof. A. M. Tekalp

Mesh fitting

Corner tracking Go to next frame

Mesh propagation and refinement

Modified mesh Reference still image Image synthesis

Synthesized video

256

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

MULTIPLE OBJECT TRACKING


Occlusion-adaptive mesh modeling and design Motion estimation around object boundaries Interactions of multiple objects Temporary occlusions of objects Birth and death of objects

257

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Frame-Based Occlusion-Adaptive Mesh Tracking


Node-to-be-split New node

11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111
Frame k

111111111111 000000000000 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111
Frame k+1

BTBC

UB (Mesh refinement within the UB)

No node points within the BTBC region Mesh propagation with node point motion vectors Model failure detection (ideally, MF region = UB region) Mesh re nement within the MF region 258

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Motion Estimation Around Object Boundaries


Nodes with two motion New nodes

00000000000 11111111111 11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 BTBC 00000000000 11111111111 UB 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 11111 00000 0000 1111 000 111 00000000000 11111111111 00000000000 11111111111 00000 11111 0000 1111 111 000 00000000000 11111111111 00000000000 11111 11111111111 00000 0000 1111 000 111 00000000000 11111111111 00000 11111 0000 1111 000 111 00000 11111 0000 1111 000 111
Frame k Frame k+1

Use mesh elements from one object at a time only More than one motion vector for some nodes on the boundary BTBC regions should map onto a curve segment in the next frame. 259

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

VOP-Based Object Tracking

Each object is tracked independently. Uncovered areas are either assigned to one of the existing objects, or to a new object. Object mosaicing.

260

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 2

TIME-VARYING IMAGE FORMATION MODELS


1. Video Source Model 2. Modeling 3-D Rigid Motion 3-D Translation, Rotation, and Scale Characterization of the Rotation Matrix 3. Homogeneous Coordinates 4. Camera Models and Image Formation Projective Camera ! Perspective Projection A ne Camera ! Weak-Perspective and Orthographic Projection Photometric Image Formation

c 1995-98 This material is the property of A. M. Tekalp. It is intended for use only as a teaching aid when teaching a regular semester or quarter based course at an academic institution using the textbook "Digital Video Processing" (ISBN 0-13-190075-7) by A. M. Tekalp. Any other use of this material is strictly prohibited.

25

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

VIDEO SOURCE MODEL

shot 1

shot N

A video source is a collection of shots. A shot is a video clip recorded by an uninterrupted motion of a single camera. Shot boundaries can be clean (as in a camera break) or blurred into a few frames as in special e ects such as dissolves, wipes, fade-ins, and fade-outs.

26

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Source Modeling of a Video Shot


Observation Noise

3-D Scene Modeling

Image Formation

Spatio-Temporal Sampling

Representation of digital video.

The variation in the intensity of the images from frame to frame is due to 3-D camera motion, e.g., zoom and pan, etc. 3-D object motion, e.g., local translation and rotation, photometric e ects of 3-D motion change in the scene illumination We neglect deformable body motion at this time. 27

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

MODELING 3-D RIGID MOTION

time t k

time t k+1

Three-D displacement of a point on a rigid object - in the Cartesian coordinates, ( 1 an a ne transformation - in the homogeneous coordinates, ( a linear transformation
X X2 X3

), ),

kX1 kX2 kX3 k

Three-D velocity of a point on a rigid object 28

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Modeling 3-D Displacement in the Cartesian Coordinates

3-D rotation, translation and scaling (zooming) of a rigid body can be represented by an a ne transformation

X = SRX + T
0 0 0 0

2 6 =6 4

X1 X2 X3

3 7 7 5 and

2 6 X=6 4

X1 X2 X3 t

3 7 7 5
t

&

denote the coordinates of a point at time instants k+1 and k , respectively, 2 3 2 3 0 0 1 1 6 7 6 7 6 7 6 T = 4 2 5 and S = 4 0 2 0 7 5 0 0 3 3
T T T S S S tk

are the translation vector between

and

tk+1

and scaling matrix, respectively.

29

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Rotation: Eulerian angles in Cartesian coordinates: An arbitrary rotation in the 3-D space can be represented by the Eulerian angles , and of rotation about the X1 , X2 and X3 axes, respectively.
X 2

(0,1,0)

= 90 X
(1,0,0)

= 90 1

(0,0,1)

= 90

&

Eulerian angles of rotation.

30

'
and

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The matrices that describe clockwise rotations about individual axes are given by 2 3 2 3 1 0 0 cos 0 sin 7 6 7 6 7 6 R =6 R = 1 0 7 4 0 cos ; sin 5 4 0 5 0 sin cos ; sin 0 cos

2 cos 6 =6 4 sin 0

; sin
cos 0

3 0 7 07 5 1

&

An Example: Consider rotation around the X1 axis by 90 degrees 2 3 2 32 3 2 3 X1 1 0 0 0 0 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 4 X2 5 = 4 0 cos 2 ; sin 2 5 4 1 5 = 4 0 7 5 X3 0 sin 2 cos 2 0 1
0 0 0

Recall that matrix multiplication is not commutative thus, in composite rotations, the order of specifying the rotations is important. 31

'
and

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Assuming in nitesmall rotation from frame to frame, i.e., = , etc., and approximating cos 1 and sin , etc., these matrices simplify as 2 3 2 3 1 0 0 1 0 7 6 7 6 7 6 R =6 R = 40 1 ; 5 4 0 1 0 7 5 0 1 ; 0 1
2 1 6 =6 4 0

1 0

3 0 7 07 5 1 3 7 7 5

&

Then, the composite rotation matrix R is given by: 2 1 ; 6 R=R R R =6 1 ; 4

32

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Rotation about an arbitrary axis in Cartesian coordinates: A 3-D rotation can be represented by an angle about an axis, described by the directional cosines 1 , 2 and 3 , through the origin.
n n n

(n , n , n ) 1 2 3

&

Rotation about an arbitrary axis.

33

'
Then,

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

$
3 7 5

2 R = 6 4

+ (1 ; ) (1 ; ) + n1 n3 (1 ; cos ) ; n2 sin
n2 1 n1 n2 n2 1 cos cos n3 sin

(1 ; cos ) ; n3 sin 2 n2 2 + (1 ; n2 )cos n2 n3 (1 ; cos ) + n1 sin


n1 n2

(1 ; cos ) + n2 sin n2 n3 (1 ; cos ) ; n1 sin 2 n2 3 + (1 ; n3 )cos


n1 n3

For an in nitesmall solid angle 2 6 R = 6 4 and we have

, R reduces to 1

;3
n n1

;2
n

n3

;1
n

n2

3 7 7 5

&

= = =

n1 n2 n3

, , .

34

'
lim

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Three-D Velocity Model


0 0 0

$
T1 t T2 t T3 t

Start with the 3-D displacement model for rotation and translation only, 2 3 2 32 3 2 3 X1 7 6 1 ; X1 7 6 T1 7 6 7 6 6 1 ; 7 4 X2 7 5=6 4 56 4 X2 7 5+6 4 T2 7 5 X3 ; 1 X3 T3

t!

2 X0 1 6 0 X2 06 4 0 X3

X1 t ;X2 t ;X3 t

3 7 7 5 = lim

t!

2 0 6 06 4 t

;
t

0
t

32 3 t 7 6 X1 7 ; t 7 56 4 X2 7 5 + lim 0 X3

t!

2 6 06 4

3 7 7 5

&

32 3 2 3 _1 X ; 3 2 7 6 X1 7 6 V1 7 _2 X 0 ; 1 7 56 4 X2 7 5+6 4 V2 7 5 _3 X ; 2 1 0 X3 V3 where i and Vi denote the angular and translational velocities respectively, for i = 1 2 3.

2 6 6 4

3 2 0 7 6 7 5=6 4 3

35

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

HOMOGENEOUS COORDINATES

De ne the vectors X and X0 in the homogeneous coordinates as 2 3 2 0 3 kX1 kX1 6 7 6 0 7 6 7 6 kX 0 kX 2 2 7 6 7 6 Xh = 6 and Xh = 6 0 7 7 4 kX3 5 4 kX 7 5

Then, the a ne transformation in the Cartesian coordinates X0 = AX + T can be expressed as a linear transformation in the homogeneous coordinates ~ h X0h = AX where 2 3 a11 a12 a13 T1 6 7 6 a a a T 21 22 23 2 7 ~ =6 7 A 6 4 a31 a32 a33 T3 7 5 0 0 0 1

36

'
where where

Digital Video Processing ~ X0 = TX


h

c 1995-98 Prof. A. M. Tekalp

Translation:
h

2 1 0 0 T1 6 6 0 1 0 T2 ~ =6 T 6 6 4 0 0 1 T3 0 0 0 1

3 7 7 7 7 7 5

Scaling (Zooming):

~X X0 = S
h

&

2 3 S1 0 0 0 6 7 6 7 0 S2 0 0 6 7 ~=6 S 7 6 4 0 0 S3 0 7 5 0 0 0 1

37

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Rotation:

~ X0 = RX
h

2 6 6 ~ =6 R 6 6 4

r11 r21 r31

r12 r22 r32

r13 r23 r33

3 0 7 07 7 7 07 5 1

&
rij

denotes the elements of the rotation matrix in the Cartesian coordinates.

38

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

GEOMETRIC IMAGE FORMATION


Imaging systems capture 2-D projections of a time-varying 3-D scene. The projection can be represented by a mapping
f

X1 X2 X3 t

)!(

x1 x2 t

X1 X2 X3 x1 x2

, and are continuous variables.


t

We consider two classes of camera models

&

- Projective Camera ! Perspective (Central) Projection - A ne Camera ! Weak-Perspective and Orthographic Projection

39

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Projective Camera

There are three coordinate systems - camera, image, and world. 1. Camera Coordinate System: Perspective Projection
Y c Xc y c (x ,y ) 0 0 xc

Z c

The center of projection coincides with the origin of the camera coordinates.

Using similar triangles


xc f

Xc Zc

and

yc f

Yc Zc

40

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Perspective projection is nonlinear in the Cartesian coordinates however, it can be expressed as a linear operation in the homogeneous coordinates.
2 3 2 3 2 xc 7 6 Xc 7 6 1 0 0 0 6 6 4 yc 7 5= 6 4 Yc 7 5= 6 40 1 0 0 f Zc 0 0 1 0
= f=Zc

3 2 Xc 3 6 7 7 6 Y c 7 7 6 7 56 4 Zc 7 5 1

&

41

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

2. Image Coordinate System: Intrinsic Camera Parameters


kx xc ky y

= c =
xi

xi yi

; ;

x0 y0

y y i

c x c

x ,y 0 0

The units of k is pixels/length. No shear between camera axes.

&

where C is called the camera calibration matrix, and the principle point ( is where the optic axis intersects the image plane. 42

2 3 2 32 3 2 3 xi 7 6 fkx 0 x0 7 6 xc 7 xc 7 6 6 f6 4 yi 7 5=6 4 0 ;fky y0 7 56 4 yc 7 5 = C6 4 yc 7 5 1 0 0 1 f f

x0 y0

'
Y c

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

3. World Coordinate System: Extrinsic Camera Parameters


Xc (x ,y ) 0 0

Z c

Zw

2 3 Xc " 6 7 6 7 Yc 7 6 = 6 7 4 Zc 5 1
Yw

R 0T

2 3 # 6 Xw 7 t 6 Yw 7 6 7 6 1 4 Zw 7 5 1

R, t

From world coordinates to pixels: 2 2 3 2 3 Xw " # 1 0 0 07 xi 7 6 6 6 6 R t Yw 6 7 6 6 7 = C 0 1 0 0 4 5 0T 1 6 4 yi 5 4 Zw 0 0 1 0 1

Xw

&

3 7 7 7 7 5

General Pin-Hole Camera Equation " # " # " # xi (R1 Xw + tx )=(R3 Xw + tz ) x =f + 0 yi (R2 Xw + ty )=(R3 Xw + tz ) y0 43

' &
or

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Perspective Projection (Special Case)


X2 x 2 X1 x 1

image plane

(X , X , X ) 1 2 3

f lens center (x , x ) 1 2

X x

The camera coordinate system is aligned with the world coordinate system.

x1 = ; X1 and x2 = ; X2 (similar triangles) f X3 ; f f X3 ; f fX2 1 and x x1 = f fX 2 = ; X3 f ; X3

44

'
Let then

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Weak-Perspective Projection
Zi

= R3

X+

Dz

, then the perspective projection is given by " # " # c x = f (R1 X + Dx)=Zci + ox (R2 X + Dy )=Zi oy

If the average distance of the object from the camera


Zi

=
"

Zi

;
T

Zave

is such that

Zave

= R3

X
"

ave

<< Zave

&

f x = Zave c

R1 X + f c Zave RT 2

# " # Dx o + x Dy oy

45

' &
T

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

A ne Camera
An uncalibrated weak-perspective projection 2 3 2 3 2 X1 3 x1 7 6 T11 T12 T13 T14 7 6 7 6 6 X2 7 6 7 6 7 6 7 4 x2 5 = 4 T21 T22 T23 T24 5 6 X3 7 4 5 x3 0 0 0 T34 X4 In Cartesian coordinates, where M is a 2 3 matrix with elements t = 14 34 24 34]
=T T =T

x = MX + t
Mij

Tij =T34

and

46

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Orthographic Projection
X X

Let the image plane be parallel to the 1 ; 2 plane of the world coordinate system. Then, in Cartesian coordinates x1 = X1 and x2 = X2 or in vector-matrix notation 2 3 " # " # X1 7 x1 1 0 0 6 6 = X2 7 4 5 x2 0 1 0 X3
X 1

x1 X 3 x2 X 2

All rays from the 3-D object (scene) to image plane are parallel to each other.

47

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

$
)

PHOTOMETRIC IMAGE FORMATION


If a Lambertian surface with constant albedo is illuminated by a single point source, the image intensity under orthographic projection is given by
sc x1 x2 t

)=

where L = ( 1 2 3) is the unit vector in the mean illuminant direction and N is the unit surface normal of the scene at position ( 1 2 3 ( 1 2 )) given by
L L L X X X X X

N( ) L
t

N = (; ;
p p q

1) ( 2 + 2 + 1)1=2
= p q X3 x1 x2

@X3 3 in which = @X @x1 and = @x2 are the partial derivatives of depth with respect to the image coordinates 1 and 2 respectively.
x x

48

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

s ( x , x , t) image intensity c 1 2 N (t) surface normal L illumination

Photometric model.

Note that the illuminant direction can also be expressed in terms of tilt and slant angles as

= ( 1 2 3) = (cos sin sin sin cos )


L L L X X3 X

where , the tilt angle of the illuminant, is the angle between L and the 1 ; plane, and , the slant angle, is the angle between L and the positive 3 axis. 49

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Photometric E ect of 3-D Motion


ds x x t d

Assuming that the mean illuminant direction L remains constant, we can express the change in intensity due to photometric e ects of the motion as c( 1 2 ) = L N N at the point ( Approximate ddt
dt dt : X1 X2 X3 d

N
t

) as

dt

&

and

) ; N( 1 2 3 ) (; ; 1) ; (; ; 1) = ( 2 + 2 + 1)1=2 ( 2 + 2 + 1)1=2 =

N(
p0

X1 X2 X3 p
0

q0

= =

@X3 @x1 @X3 @x2


0 0 0

@X3 @x1 @ x1 @x1 q q


0

+ =; 1; 50

+ =; 1+

p p

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

LECTURE 3 SPATIO-TEMPORAL SAMPLING

1. Spatio-Temporal Sampling 2-D Sampling Structures for Analog Video 3-D Sampling Structures for Digital Video Analog-to-Digital Conversion 2. Spectral Characterization of Sampled Video 2-D Sampling on a Rectangular Grid 2-D/3-D Sampling on a Lattice 3. Reconstruction of Continuous Video from Samples Digital-to-Analog Conversion

c 1995-97 This material is the property of A. M. Tekalp. It is intended for use only as a teaching aid when teaching a regular semester or quarter based course at an academic institution using the textbook "Digital Video Processing" (ISBN 0-13-190075-7) by A. M. Tekalp. Any other use of this material is strictly prohibited.

52

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spatio-Temporal Sampling
R G source B RGB to YUV Y U V NTSC encoder composite signal NTSC decoder Y U V YUV to RGB R G B display

Consider the image plane intensity distribution ( three continuous variables. Then,
x t

sc x1 x2 t

) as a function of

{ for analog storage and transmission it is sampled in two dimensions


Sampling the composite signal vs. component signals.

(usually 2 and ) by means of the scanning process, and { for digital processing, storage and transmission in all three dimensions.

53

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

2-D Sampling Structures


2

Analog Progressive Video x


x V= 0 x 2 t t 2 0 t

Analog 2:1 Interlaced Video x


2 2 x V= x 0 2 x 2

t /2

2 t t /2

(Each dot indicates a continuous line of video perpendicular to the plane of the page.)

54

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

3-D Sampling Structures


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 V = x 0 1 0 x 0 0 2 0 t

Progressive Sampling

Vertically Aligned 2:1 Line-Interlaced Sampling


1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 V = 0 x1 0 0 2 x x 2 2 0 0 t/2

(Each dot indicates a pixel location, the numbers indicate the time of sampling.)

55

'

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Field-Quincunx Sampling
1 2 1 2 1 1 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 2 x 0 x1 /2 1 V= 0 2 x2 x2 0 0 t /2

Line-Quincunx Sampling
1 2 1 2 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 x x /2 0 1 1 V = 0 2 x 0 2 0 0 t 0 c= x 2 t /2

&

1] E. Dubois, \The sampling and reconstruction of time-varying imagery with application in video systems," Proc. IEEE, vol. 73, no. 4, pp. 502-522, Apr. 1985.

56

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Analog-to-Digital Conversion
Minimum sampling frequency is 4.2 2 = 8.4 MHz (Nyquist rate) Sampling rate should be an integral multiple of the line rate, so that samples in successive lines are aligned. For sampling the composite signal, the sampling frequency must be an integral multiple of the subcarrier frequency. This simpli es decoding (composite to RGB) of the sampled signal. For sampling component signals, there should be a single rate for 525/30 and 625/50 systems i.e., the sampling rate should be an integral multiple of both 29.97 525 = 15,734 and 25 625 =15,625.

57

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Sampling the Composite Signal


NTSC 3 fsc 4.2 3.58/10.74 682/576 85.9 NTSC SMPTE 244M 4.2 3.58/14.32 910/768 114.5 PAL 4 fsc 5.5 4.43/17.72 1134/939 141.8

Bandwidth (MHz) Subcarrier/sampling frequency (MHz) Total/active samples/line Bitrate (Mbps)

Sampling Component Signals


Luminance Chrominance 4:2:2 Sampling frequency (MHz) Total/active samples/line Bitrate (Mbps) Sampling frequency (MHz) Total/active samples/line Bitrate (Mbps) 525/59.94 SMPTE 125M 13.5 858/720 108 6.75 429/355 54 625/50 13.5 864/720 108 6.75 432/358 54

58

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Chrominance Formats for Digital Video


Y Y Y

U U U

V V V

4:4:4

4:2:2

4:2:0

59

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

2-D Sampling on a Rectangular Grid


With rectangular sampling, we sample at the locations
x1 x2

= =

n1 n2

1 2
x y

where 1 and respectively.

are the sampling distances in the and directions,

The sampled signal can be expressed as


s n1 n2

)= (

sc n1

n2

60

' &

Digital Video Processing


sc x1 x2

c 1995-97 Prof. A. M. Tekalp )

2-D Fourier Transform of Continuous Signals


(

Sc F1 F2

Sc F1 F

s c x1 x

1Z1 ( 1 2 ) exp f; 2 ( 1 1 + 2 2 )g 2) = ;1 ;1 Z1Z1 ( ) exp f 2 ( 1 1 + 2 2 )g 2) = ;1 ;1 1 2


Z
sc x x j F x F x Sc F F j F x F x

dx1 dx2

dF1 dF2

2-D Fourier Transform of Discrete Signals


s n1 n2

S f1 f2

S f1 f2

)= )=

1 X
n

1 X
n

1=

;1
Z

2=

;1
(

s n1 n2

) exp f; 2 (
j j f1 n1

f1 n1

f2 n2

)g

s n1 n2

; ;
1 2

1 2

1 2 1 2

S f 1 f2

) exp f 2 ( 61

f2 n2

)g

d f1 d f2

'
( (
(

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectrum of the Sampled Signal


1Z1 ( 2) = ;1 ;1
Z
Sc F1 F2

Evaluate the inverse Fourier transform expression at the sampling locations:


s n1 n

) exp f 2 (
j

F1 n1

F2 n2

)g

dF1 dF2

De ne

f1

s n1 n

1Z1 ( 2) = ;1 ;1
Z
Sc

F1

and

f2

F2

2
f2

, ) exp f 2 (
j f1 f2 f1 n1

f1

f2 n2

)g 1
1

d f1 d f2

Next, break the integration over the ( over a square denoted by ( 1 2)


s n1 n2

) plane into a sum of integrals each


2

)=

XXZ Z
k

SQ k

1
SQ k

&

where

SQ k1 k2

) is de ned as 1+ 1+ ;2 1 1 2
k < f

1 k2 )

Sc

f1

f2

) exp f 2 (
j

f1 n1

f2 n2

)g

d f1 d f2

k1

and 62

;1 2+

k2 < f2

1+ 2

k2

'

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

A change of variables

f1

0=
(
1 2

f1

shifts all the squares


s n1 n2

SQ k1 k

0 = 2; 2 and 2 1 1 1 1 2 ) down to (; 2 2 ] (; 2 2 ],
k1 f f k

) =

But, exp f; 2 ( 1 1 + 2 2 )g = 1 for frequencies ( 1 ; 1 2 ; 2 ) map onto (


j k n f k n f k k s n1 n2

exp f 2 (
j

1 ;2

f 1 ;
2

1 2

1 XX (
2
Sc

f1

;
j

k1

f2

f1 n1

f2 n2

)g exp f; 2 (

; 2 )g 2
k

k1 n1

k2 n2

)g

d f1 d f2

k1 k2 n1 n2 f1 f

)=

; ;
1 2

1 2

1 2

integers. Thus, the 2 ). Compare the last expression with


f1 n1

1 2

S f 1 f2

) exp f 2 (
j

f2 n2

)g

d f1 d f2

&

to conclude that
S f1 f2

)=

1
1 2

XX
Sc
k

for ; 1 2

f1

;
1 2

k1

f2

k2

< f1 f2

63

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

F 2 S (F ,F ) c 1 2 F 1

B (a) x

F 2 2

S (F ,F ) p 1 2

1/ x x 2 x x 1 1

2 1/ x F 1

(b)

(c)

Sampling on a 2-D rectangular grid.

64

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

2-D Periodic Sampling with Arbitrary Geometry


v v
T

An arbitrary periodic sampling geometry can be de ned by the vectors v1 = ( 11 21) and v2 = ( 12 22) , such that
v v
T

x1 x2

= =

v11 n1 v21 n

+ 1+

v12 n2 v22 n2

v 2 v 1

Arbitrary periodic sampling geometry.

65

'
where and

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

In vector-matrix form,

x = Vn x=(
x1 x2

) , n=(
T

n1 n2

is the sampling matrix. Thus, the sampled signal can be expressed as


s

V = v1jv2]

(n) = (Vn)
sc

&

^ = EV, where E is 1) The sampling matrix V for a given grid is not unique. V an integer matrix with detE = 1 is also a sampling matrix for that grid. 2) The quantity jdetVj is unique and denotes the reciprocal of the sampling density. 66

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

2-D Fourier Transform Relations in Vector Form


Sc

sc

where F = (

F1 F2

) .
T

1 (x) exp ; 2 F x x (F) = ;1 Z1 (x) = (F) exp 2 F x F ;1


Z
sc j
T

Sc

(f ) =

1 X
n=;1
Z
s

(n) exp

;2
j j
T

f n
T

(n) =

where f = ( 1 2 ) . The integrations and summations in these relations are double integrations and summations.
f f
T

1 2 1 2
S

(f ) exp 2 f n

67

' &
s

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectrum of the Sampled Signal


1 (n) = (Vn) = (F) exp 2 F Vn F ;1
Z
sc Sc j
T

Similar to the case of rectangular sampling, express


s

where f = j Vj F using the Jacobian. Expressing the integration over the f plane as a sum of integrations over the 1 1 ], we have 1 1] (; 2 squares (; 2 2 2
d det d

Making the change of variables f = V F, Z1 ;1 1 (n) = ( V f ) exp 2 f n f j Vj


T

;1

det

Sc

(n ) =

; k

1 2

where exp

;2
j

1 2

1
T

det

Vj

Sc

(V

;1

(f ; k)) exp 2 f n exp


j
T

;2
j

kn f
T

k n = 1 for k an integer valued vector.


68

'

Digital Video Processing


Z
1 2 1 2
S

c 1995-97 Prof. A. M. Tekalp

Comparing this expression with


s

(n) =

we conclude that
S

(f ) exp 2 f n f
j
T

(f ) = j V j k
det

X
Sc

(V

;1

(f ; k))

or equivalently
Sp

(F) = j Vj k where the periodicity matrix U satis es


det
T

X
Sc

(F ; Uk)

U V=

&

and I is the identity matrix. The periodicity matrix can be expressed as U = u1ju2], where u1 and u2 are the periodicity vectors. Note that the above formulation is also valid for rectangular sampling with the matrices V and U diagonal. 69

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

F 2 S (F ,F ) c 1 2 F 1

B (a)

F 2 2

v 2 v1 x 1

u2 u 1 F 1

(b)

(c)

Sampling on an arbitrary 2-D periodic grid.

70

'
then

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Sampling on 3-D Lattices

Let v1 v2 v3 be linearly independent vectors in the 3-D Euclidean space R3 . A lattice in R3 is the set of all linear combinations of v1 v2 v3 with integer coe cients = f 1 v1 + 2v2 + v3
n n k

n1

n2

2 Zg

In vector-matrix notation, let V be the sampling matrix

V = v1jv2jv3]
]
T

= fV
s k sc

n1 n2 k

j
T

n1 n2 k

) 2 Z3 g )2
Z

&

A spatio-temporal signal (x ) sampled on a lattice can be expressed as


sc t

Observe that ( ) = jdet(V)j denotes the reciprocal of the sampling density, and V is not unique.
d

(n ) = (V

n1 n2 k

] )

n1 n2 k

71

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Reciprocal lattice
T

x Given a lattice , the set of all vectors r such that r 4 5 is an integer


for all (x ) 2 is called the reciprocal lattice of . A basis for is the set of vectors u1 u2 u3 determined by
t t

uv =
T i j

ij

i j

=1 2 3

or equivalently

U V=I
T

where I is an 3x3 identity matrix.

72

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Unit Cell (Voronoi cell) The set of points that are closer to the origin than to any other sample point.
x2

73

'
s

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Fourier Transform on a Lattice


k sc n1 n2 k

Let (n ) = (V
S

] )
T

n1 n 2 k

(f ) =

n (n ) exp :; 2 f 4 (n )2Z3
X
k

8 <

)2

, then
k

39 = 5

f 2 R3

and
s

(n ) =
k
T

;1 2

1 2

n (f ) exp : 2 f 4
j
T

8 <

39 = 5 df

(n ) 2
k

&

where f = V F is the normalized frequency. The Fourier transform of a signal sampled on a lattice is periodic with the replications centered at the sites of the reciprocal lattice . Note that 1 1 1 1 1] (; 1 ] 2 P , where P f 2 (; 2 2 2 2 ] (; 2 2 ] implies that F = 1 2 denotes the unit cell of the reciprocal lattice .
F F Ft
T

74

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectrum of Signals Sampled on a Lattice


sc L

Suppose that (x) 2


Sc

(R )
M

(F) =

R3
Z

sc

x (x ) exp :; 2 F 4
t j
T

8 <

39 = 5 dx dt

F 2 R3
(x ) 2 R3
t

with the inverse transform


sc

(x ) =
t

R3

Sc

(F) exp : 2 F
j

8 <

2
T

x
t

39 = 5 dF

The Fourier transform of the sampled signal is equal to an in nite sum of copies of the analog spectrum shifted according to the reciprocal lattice X 1 (F) = ( ) (F + Uk) k2Z3 where U V=I
Sp d Sc
T

75

' &
pro

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

$
3

Example: Progressive and the 2:1 line interlaced sampling lattices.


t /2 t

(a)

(b)

The periodicity matrices indicating the locations of the replications


2

=V

pro

;1 T

6 =6 4 0

0
1

0 7 0 7 5 and
1
t

int

=V

int

;1T

6 =6 4 0

0 7 0 7 5
2
t

76

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Sublattices Let and ; be lattices. is a sublattice of ; if every point in is also a point of ;. Then, ( ) is an integer multiple of (;). The quotient ( ) (;) is called the index of in ;, and is denoted by ( : ;). If is a sublattice of ;, then ; is a sublattice of .
d d d =d

Cosets of a lattice The set

c + = fc + 4 x
t

3 2 5 4

x
t

3 5

and c 2 ;g

is called a coset of in ;. Thus, a coset is a shifted version of the lattice .

77

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Other Sampling Structures

The most general form of the sampling structure that we will study is the union of certain cosets of a sublattice in a lattice ; = where c1 Note that
:::
P P

becomes a lattice if we take = ; and = 1.


P

c is a set of vectors in ; such that c ; c


i

=1

(c + )
i j

62

for = 6 .
i j

v 2 2 c v1 x 1 x 1

78

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectrum of Signals Sampled on a Structure


Sp

The function

X 1 (k) (F + Uk) (F) = ( ) k


d g Sc g

(k) =

P X

is constant over cosets of ; in , and may be zero for some of these cosets, so the corresponding shifted versions of the analog spectrum are not present.
F 2

=1

exp 2 k U c
j
T T

F 1

Reciprocal lattice

79

'

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Reconstruction from Samples on a Rectangular Grid


Sr F1 F2 S F F F < F <

Band-limited reconstruction of the analog video requires ideal low pass ltering ( forj 1 j 2 1 1 andj 2 j 2 1 2 1 2 ( 1 1 2 2) ( )=
0 otherwise.
F 2

1/2 x2 F 1 1/2 x1

&
(

Taking the inverse Fourier transform, we have


sr x1 x2

Reconstruction lter. (
F1

)=

1 2 1 ;1 2 1

1 2 2 ;1 2 2

2S

F2

) exp f 2 (
j

F1 x1

F2 x2

)g

dF1 dF2

80

'
( (

Digital Video Processing


Z
1 2 1 ;1 2 1

c 1995-97 Prof. A. M. Tekalp


S F1

Substituting the de nition of (


sr x1 x2

) =

F2

)
)
j F1 x1

$
)g

exp f; 2 (
j

1 2 2 ;1 2 2
F1

XX
n

s n1 n2

1 n1

F2

2 n2

)gg exp f 2 (

F2 x2

)g

dF1 dF2

Rearranging the terms, we have


sr x1 x2

) =

XX
n

s n1 n2

1 2 1 ;1 2 1

exp f 2 (
j

1 2 2 ;1 2 2

exp f; 2 (
j F1 x1

F1

1 n1

F2

2 n2

F2 x2

)g

dF1 dF2

Note that the integral evaluates to

&

h x1 x2

)=

sin

; 1 1) 1 ( 1 ; 1 1)
1 (x1
x n n

sin

; 2 2) 2 ( 2 ; 2 2)
2 (x2
x n n

which is the ideal interpolation function for rectangular sampling. 81

' &
where

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Reconstruction from Samples on a Lattice

Exact reconstruction of a continuous signal from its samples on a lattice is possible via ideal low-pass ltering over a unit cell P of provided that the original continuous image spectrum was con ned to this unit cell. The ideal low pass ltering can be expressed as
Sr

(F) = :

8 <

det

Vj (V F) for F 2 P
S
T

otherwise.
2 3

In the space domain, we have


sr

(x ) =
t

X
(

n )2Z3
k

x n (n ) (4 5 ; V 4 5)
k h t k

(x) = j Vj
det

Here (x) is the ideal interpolation function for the particular lattice geometry.
h

x exp : 2 F 4
j
T

8 <

39 = 5 dF

82

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

LECTURE 4 SAMPLING STRUCTURE CONVERSION


1. Video Standards Conversion 2. Interpolation and Decimation of 1-D Signals 3. Theory of Sampling Structure Conversion
c 1995-97 This material is the property of A. M. Tekalp. It is intended for use only as a teaching aid when teaching a regular semester or quarter based course at an academic institution using the textbook "Digital Video Processing" (ISBN 0-13-190075-7) by A. M. Tekalp. Any other use of this material is strictly prohibited.

83

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Sampling Structure Conversion


sp(x1 , x2 , t) 3 (x1 , x2 , t) 1 Sampling Structure Conversion yp (x , x , t) 1 2 3 (x , x , t) 1 2 2

This is a spatio-temporal interpolation/decimation problem.

Applications
Frame-Rate Conversion Deinterlacing (interlaced ! progressive) Interlacing NTSC-to-PAL transcoding or vice versa Data Compression (U, V subsampling) 84

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Fundamentals of Decimation/Interpolation
u(n) w(n) Downsample M:1
L M.

s (n)

Upsample 1:L

Low pass filter

y (n)

Sampling rate change by a rational factor

Characterization in the Frequency-Domain Filter Design for Interpolation/Decimation


1] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, NJ, 1989.

85

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Interpolation
s n

Given ( ), de ne a signal ( ) that is upsampled by


u n

8 n < (L) )=: 0


s

u n

for = 0 otherwise.
n

s(n)

(a) 0 1 2 3 4 5 ... u(n) (b) 0 1 2 3 4 5 ...

Upsampling by L = 3.

86

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectrum of the Upsampled Signal


U f

( )=

1 X
n=

;1

u n e

( ) ;j 2

fn

1 X
n=

;1

s n e

( ) ;j 2

fLn

= ( )
S fL

S(f)

f (a) U(f) -1/2 0 1/2

f (b) -1/2 -1/6 0 1/6 1/2

Upsampling by L = 3.

87

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Ideal Interpolation Filter


H(f) U(f)

Ideal interpolation lter is an ideal lowpass lter.

...

... f

(a) Y(f) ...

-1

-1/2

0 1/2L 1/2

... f

(b)

-1

Interpolation by L = 3.

The impulse response of the ideal interpolation lter is a sinc function. Because of its zero-crossings it will not alter the existing signal samples, while assigning values for the zero samples in the upsampled signal. 88

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Practical Interpolation Filters


n h(n) 1 n 0 1 2 h(n-k) u(k) k

Zero-order hold (sample repeat)

The impulse response for L = 3.

Linear interpolation
h(n)
1 2/3 1/3 2/3

h(n-k) n u(k)

1/3

k n

The impulse response for L = 3.

89

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Cubic Spline Interpolation


- Approximate the impulse response of the ideal lowpass lter (sinc function) by three cubic polynomials. - The frequency response is better than that of the truncated sinc function.
h(n) n
0

h(n-k) u(k) k n

The impulse response for L = 3.

90

'
Then,

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Decimation
s n w n

Given ( ), de ne an intermediate signal ( ) ( )= ( )


s n y n

1 X

w n

k=

;1

kM

( )= (

w Mn

)
... (a) n

s(n)

0 1 2 3 4 5 6 ... w(n) ...

(b) n

&

y(n)

0 1 2 3 4 5 6 ... ... 0 1 2 3 4 5 6 ...

(c) n

Decimation by M = 2.

91

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Spectrum of the Decimated Signal


W (f ) Y (f ) =
S(f) ... (a) -1 W(f) ... (b) -1 Y(f) ... (c) -1 ... f -1/2 0 1/2 1 ... f -1/2 0 1/2 1 ... f

1 X
n=;1

M ;1 k=0

S (f

k ;M )

w (M n)e;j 2 fn

f ) = W(M

Decimation by M = 2.

-1/2

1/2

92

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Decimation Filters
Decimation filter S(f) ... ... f -1 W(f) ... ... f -1 Y(f) ... ... f -1 -1/2 0 1/2 1 -1/2 0 1/2 1 -1/2 0 1/2 1

To avoid aliasing, lowpass lter the signal before decimation.

Antialias ltering for M = 2.

Box lters are generally used instead of ideal lowpass lters for simplicity. 93

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Rate Change by a Rational Factor


s (n) Upsample 1:L u(n) Low pass filter w(n) Downsample M:1 y (n)

Rate change by a factor of L=M .

A single lowpass lter with cuto frequency 1 1 c = minf 2M 2L g is su cient. When , the requirement to preserve the values of the existing samples must be incorporated into the lter design.
f L > M

94

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Practical Method
o x o x o 3:4 conversion

525

625

525:625 conversion

95

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Theory of Sampling Structure Conversion


We extend the notions of decimation and interpolation to conversion from one sampling structure (lattice) to another. Sums of lattices
1+ 2=

fx + y j x 2 fx j x 2

1 and y

2 2g

Intersection of lattices

The intersection 1 2 is the largest lattice which is a sublattice of both 1 and 2 , while the sum 1 + 2 is the smallest lattice which contains both 1 and 2 as sublattices. 96

2=

1 and x

2 2g

'
and

Digital Video Processing


u p (x , x , t) 1 2 3 3 (x , x , t) + 1 2 1 2 sp (x , x , t) 1 2 3 (x , x , t) 1 2 1

c 1995-97 Prof. A. M. Tekalp


w p (x , x , t) 1 2 3 3 (x , x , t) + 1 2 1 2 y (x , x , t) p 1 2 3 (x , x , t) 1 2 2

Upconvert U

Low pass filter

Downconvert D

Decomposition of the system for sampling structure conversion.

De ne

8 < p(x p (x ) = U p(x ) = :0


s t s t y

) (x ) 2 1 (x ) 62 1 x ) 2 1 + 2
t t t

&

p(

x ) = D p (x ) = p (x ) (x ) 2
t w t w t t

if the T input is shifted by q, the output should also be shifted by q. We need q 2 1 2. Thus, we assume that T ;1 V2 is a matrix of integers. 1 2 is a lattice, i.e., V1
Condition for the shift invariance of the lter:

97

'
w u

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

The ltering operation can be expressed as


p(

The Filter

x )=
t s

X
(q )2 1 +
2

p(

q )

2 3 2 3 x q (4 5 ; 4 5)
t

(x ) 2 1 + 2
t

but p(x) = p(x) for x 2 1 and zero otherwise,


w

p(

x )=
t

(q )2

p(

q )

2 3 2 3 x q (4 5 ; 4 5)
t

(x ) 2 1 + 2
t

After the downsampling,


y

p(

x )=
t

X
(q )2
1

p(

q )

2 3 2 3 x q (4 5 ; 4 5)
t

(x ) 2 2
t

&

One period of the lter frequency response is given by the unit cell of ( 1 + 2 ) . In order to avoid aliasing, the passband of the lowpass lter is restricted to the smaller of the Voronoi cells of 1 and 2 . 98

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp


1

Example: Conversion from


x 2 1 x 2 2 2 x 1 x 2

to

2 x 1 x x 1 2 0 2 x 2

2 x 1 1 2 x x 0 2 4 x 2

V= x 2

V= x 2

2 2 + 1 2 x

2 1

2 2 x

2 x 1 x 0 1 0 x

2 x 1 2 x 0

x 0 4 x 2

V=

V= 2

The lattices 1 , 2 , 1 + 2 and 2. d( 1 ) = 2X1 X2 , and d( 2 ) = 4X1 X2 T 2) = 2 Q = ( 1 + 2 : 1) = ( 2 : 1

T 1

99

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

* 1

F 2 1/ x

* 2

F 2

1/ x

F 1

F 1

U=

1/ x1 - 1/2 x1 0 1/2 x 2

The spectrum of s(x) with periodicity

, and the frequency response of the lter.

One period of the lter frequency response is given by the unit cell of ( 1 + 2 ) . In order to avoid aliasing, the passband of the lowpass lter is restricted to the Voronoi cell of 2 . 100

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Example: Deinterlacing
t t

(a)

(b)

The sampling matrices for the input and output grids are

2 x1 6 Vin = 6 40
0
det

Note that, j Vin j = 2j Vout j.


det

0 2 0

0
x2 x2 t

3 7 7 5

and

2 x1 6 Vout = 6 40
0

0 0
x2

0 0
t

3 7 7 5

101

' &

Digital Video Processing

c 1995-97 Prof. A. M. Tekalp

Comments on Direct Methods


In direct methods for sampling structure down-conversion, there is a tradeo between allowed aliasing errors and loss of resolution (blurring) due to lowpass ltering prior to down-conversion. When lowpass (antialias) ltering has been used prior to down-conversion, the resolution cannot be recovered by interpolation. Motion-compensated interpolation schemes make it possible to recover higher resolution frames in the process of up-conversion if no antialias ltering has been applied prior to down-conversion.

102

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 5 OPTICAL FLOW METHODS

1. Projected Motion vs. Optical Flow 2. Occlusion and Aperture Problems 3. Optical Flow Equation 4. Two-D Motion Field Models, Nonparametric vs. Parametric 5. Lucas-Kanade Method 6. Smoothness Constraint, Horn-Schunck Method 7. Adaptive Methods

103

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Motion Estimation Problems with Applications

2-D Motion Estimation Correspondence estimation Optical ow estimation - Motion compensated image ltering. - Motion compensated image compression. 3-D Motion and Structure Estimation Based on point correspondences Optical ow-based or direct methods From stereo video - Virtual Reality, Synthetic-Natural Hybrid Imaging - Passive Navigation: A camera moves with respect to a xed environment. Determine the 3-D structure of the environment and the motion parameters of the camera. 104

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Two-D Motion
X 2 Center of projection O p P x 2 x X 3 1 Image plane X 1

There is 3-D motion between the objects in the scene and the camera.
P t p t p t O Center of projection Image plane P t

The \2-D motion" is also referred to as \projected motion." 105

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

2-D Displacement and Velocity Fields


x x

The 2-D displacement eld is a vector eld consisting of the 1 and 2 components of the frame-to-frame \projected" displacement vectors at each pixel.
time t +l t time t time t - l t d d d 1 1 d , x ) P = (x 1 2 2

2 , x ) P = (x 1 2

) P = (x 1,x 2

T d= [ d d ] 1 2

The 2-D velocity eld is a vector eld consisting of the of the instantaneous velocity vectors at each pixel. 106

x1

and

x2

components

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Optical Flow and Correspondence Fields


The observable variations of the 2-D image brightness pattern (the apparent 2-D velocity eld) is called the optical ow. The set of vectors indicating the apparent displacement of pixels from frame to frame is called the correspondence eld. The optical ow/correspondence eld is, in general, di erent from the projected 2-D motion eld due to: - lack of su cient spatial image gradient, - changes in external illumination, - changes in shading (due to rotation), etc.

107

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Optical Flow vs. 2-D Velocity Field

There must be su cient gray level variations within the moving objects.

rad/s

Changes in the illumination impairs the estimation of the projected motion.


Frame k k+1

108

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Determination of the apparent velocity v( 1 2 ) of pixels from a pair of time-sequential 2-D images. The ow vectors may vary by the coordinates (space-varying ow) due to 3-D rotation, zoom, etc.
x x t

Optical Flow Estimation

Finding the apparent displacement vectors d( 1 2 ) between a pair of frames and = + . Dense or feature correspondence estimation. (May also appear in the context of stereo disparity estimation.)
x x t ` t t t
0

Correspondence Problem
t

Given two frames that are globally shifted with respect to each other, estimate the shift. There is one displacement vector for a pair of frames.

Image Registration (Special case)

109

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

2-D Motion/Optical Flow Estimation is Ill-Posed


Estimation of the optical ow (or the 2-D motion eld) given two frames, without additional assumptions, is \ill-posed." 1. Existence of a solution: No correspondence can be found at occlusion points (covered/uncovered background problem). 2. Uniqueness of the solution: If the 1 and 2 coordinates of the displacement (or velocity) at each pixel is treated as independent variables, then the number of unknowns is twice the number of observations - the elements of the frame di erence.
x x

Theoretically, we can determine only motion that is orthogonal to the spatial image gradient, called the normal ow, at any pixel (the aperture problem).

110

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Occlusion Problem

Occlusion refers to covering/uncovering of a surface due to motion of an object. e.g. 1, when an object translates,
Frame k k+1

Background to be covered
(no region in the next frame matches this region)

Uncovered background
(no motion vector points into this region)

e.g. 2, when an object rotates about an axis parallel to the imaging plane. 111

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Aperture Problem


Aperture 2

Aperture 1 Normal flow

Basic Idea: We can only observe and determine displacement that is orthogonal to the edges (in the direction of the intensity gradient).

112

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Optical Flow Equation (OFE)


s x1 x2 t

If the intensity c(

) remains constant along a motion trajectory, we have


dsc x1 x2 t dt

) =0

where 1 and 2 varies by according to the motion trajectory. Using the chain rule of di erentiation
x x t @ sc @ x1

(x ) (x) + 1
t v

@sc

@x2

(x ) (x) + 2
t v

@sc

@t

(x ) = 0
t

This is known as the optical ow equation or the optical ow constraint. It can alternatively be expressed as

h r c(x
s s t : @ sc

) v(x) i +
t @ x2

@ sc

@t

(x ) = 0
t

where r c (x ) =

(x )
t

@sc

@ x1

(x ) ]T and h 113

i denotes vector inner product.

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Normal Flow

Is the OFE su cient to uniquely specify the motion eld ? The OFE yields one scalar equation in two unknowns at each pixel.
v 2 Loci of v satisfying the optical flow equation sc (x1 ,x2 ,t)

The OFE determines, at each pixel, the component of the ow vector that is in c (x t) the direction of the spatial image intensity gradient, s sc (x t) ,
r jjr jj

jj because the component that is orthogonal to the spatial image gradient disappears under the dot product.
v t)

? (x

; jjr

@sc (x t) @t sc (x t)

114

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Motion Models

Because of the ill-posed nature of the problem, motion estimation algorithms use additional assumptions (models) about the structure of the 2-D motion eld. Non-parametric models: Some sort of smoothness or uniformity constraint on the 2-D motion eld. Quasi-parametric models: In 3-D rigid motion six egomotion parameters constrain the local ow vector to lie along a speci c line, while the local depth value is required to determine its exact value. Parametric models: 3-D rigid motion of the image of a planar surface under orthographic projection can be described by a 6-parameter a ne model, while under perspective projection it can be described by an 8-parameter nonlinear model. There exist more complicated models for quadratic surfaces. 115

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Nonparametric 2-D Motion Estimation Methods

Methods Based on the OFE: Constant intensity along the motion trajectory yields an equation in terms of spatio-temporal intensity gradients. Used in conjunction with appropriate spatio-temporal smoothness constraints. Phase-Correlation Method: The linear term of the Fourier phase di erence between the consecutive frames determines the motion estimates. Block Matching Method: Matching xed size blocks between two frames based on a distance criterion. Extension to feature matching (e.g., edges, corners). Pel-Recursive Methods: Gradient-based minimization of the displaced frame di erence. Implicit use of smoothness constraint. Extension to Wiener-type motion estimation. Bayesian Methods: Probabilistic smoothness constraint in the form of Gibbs random elds.

116

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Methods using the OFE


COLOR IMAGES OFE can be imposed at each color band separately. Thus, the displacement vector is e ectively constrained in three di erent directions, since the direction of the spatial gradient vector at each band is di erent in general. MONOCHROMATIC IMAGES The solution space for the displacement vector can be reduced by using an appropriate smoothness constraint which requires the displacement vector to vary slowly over a neighborhood.

117

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Second-Order Di erential Methods


In search of another constraint to determine both components of the ow vector at each pixel, some proposed the conservation of the spatial image gradient, r c(x ), stated by
s t d

r c(x
s dt

) =0

An estimate of the ow eld is, then, given by 2 3 2 @ 2 s (x t) @ 2 s (x t) 3 1 2 c c ^ ( x ) 2 1 @x @x 2 x1 5 4 4 5 = 4 @ 2 s (1 2 @ sc (x t) c x t) ^2 (x ) @x1 x2 @x2 2


;

v v

t t

; ;

@ 2 sc (x t) @t@x1 @ 2 sc (x t) @t@x2

3 5

118

' &
v v

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Lucas-Kanade Method
v(x t) = v(t) =
v1 t

$
3 7 7 5

The Block Motion Model:

( ) 2( )]T
v t

for x 2 B

De ne the error in the OFE over the block of pixels B as X c (x ) c (x ) c (x ) = 1( ) + 2( ) +


E @s t

x2B

@x1

@s

@x2

@s

@t

Minimization of with respect to 1 ( ) and 2 ( ) yields 2 3 2 X @sc (x t) @sc (x t) X @sc (x t) @sc (x t) 3 12; X @sc (x t) @sc (x t) @x1 @x1 @x1 @x2 @x1 @t 6 7 6 ^ ( ) 1 x x 6 4 5=6 X @sc (x t) @sc (x t) x X @sc (x t) @sc (x t) 7 X @sc (x t) @sc (x t) 4 5 4 ^2( ) ; @x2 @t @x1 @x2 @x2 @x2
E v t v t
;

t t

2B

2B

2B

x2B

x2B

x2B

119

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Horn-Schunck Method

Minimize a weighted sum of the error in the OFE and a measure of departure from smoothness in the motionZ eld 2 2 2 (v)) x min = ( of (v) + s v(x)
E E c E d

to estimate the velocity vector at each pixel, where denotes the image support, and Eof (v(x)) = h r (x ) v(x) i + (x ) and Es2(v(x)) = jjr 1(x)jj2 + jjr 2(x)jj2
g t @g t @t v v

= (
c

@v1 @ x1

)2 + (

@ v1 @ x2

)2 + (
c

@v2 @x1

)2 + (

@v2 @x2

)2

The parameter 2 (chosen heuristically) is a weight that controls the strength of the smoothness constraint. Larger values of 2 increase the strength of the constraint, whereas smaller values relax the constraint. 120

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The minimization of the functional , using the calculus of variations, and approximation of the Laplacian of the velocity components by linear highpass lters yields the following iterations:
E

(n+1) v1 (n+1) v2

(x ) =
t

(n) v1 (n) v2

(x ) ;
t

@sc @x1 @sc @x2

@sc v (n) @x1 1 @sc @x1 v

(x ) =
t

(x ) ;
t t

@sc (n) (x ) + @sc (x ) + @x @t 2 2 @s @s 2 + ( c )2 + ( c )2 @x1 @x2 (n ) @sc (n) (x ) + @sc ( x ) + 1 @x2 2 @t 2 + ( @sc )2 + ( @sc )2 @x1 @x2
t v t t v t t

where all partials are evaluated at the point (x ). The initial estimates of the (0) (0) velocities 1 (x ) and 2 (x ) can be obtained by the block matching technique. In the digital implementation of the algorithm, the derivatives are numerically estimated.
v t v

121

'
@sc @x1 @sc @x2 @sc @t

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Finite Di erences Method

Forward di erence Backward di erence Average di erence Local average of the average di erences Horn and Schunck proposed averaging four nite di erences
= 1 4 f sc (x1 + 1 x2 t) ; sc (x1 x2 t) + sc (x1 + 1 x2 + 1 t) ; sc (x1 x2 + 1 t) + sc (x1 + 1 x2 t + 1) ; sc (x1 x2 t + 1) + sc (x1 + 1 x2 + 1 t + 1) ; sc (x1 x2 + 1 t + 1) g = 1 4 f sc (x1 x2 + 1 t) ; sc (x1 x2 t) + sc (x1 + 1 x2 + 1 t) ; sc (x1 + 1 x2 t) + sc (x1 x2 + 1 t + 1) ; sc (x1 x2 t + 1) + sc (x1 + 1 x2 + 1 t + 1) ; sc (x1 + 1 x2 t + 1) g = 1 4 f sc (x1 x2 t + 1) ; sc (x1 x2 t) + sc (x1 + 1 x2 t + 1) ; sc (x1 + 1 x2 t) + sc (x1 x2 + 1 t + 1) ; sc (x1 x2 + 1 t) + sc (x1 + 1 x2 + 1 t + 1) ; sc (x1 + 1 x2 + 1 t) g

&

122

'
N

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Local Polynomial Fitting Method


s x x2 t x x

Approximate c( 1 polynomials in 1 ,

) locally by a linear combination of some low order 2 and , i.e.,


t sc x1 x2 t

^(

)=

N ;1 X i=0

ai

x1 x2 t

where is the number of the basis polynomials, i are the coe cients of the linear superposition, and i ( 1 2 ) are the basis polynomials. Set = 9, with the following basis functions,
N a x x t

x1 x 2 t

)=1

x1 x2 t x1 x2 x1 x2 x1 t x2 t

&

Then,

sc x1 x2 t

^(

) =

a0

a x

1+ 1 1+ 2 2+ 3 + 4 2 1+ 2 5 2+ 6 1 2+ 7 1 + 8 2
a x a x a t a x a x x a x t a x t:

123

'
e

Digital Video Processing


a i :::

c 1995-98 Prof. A. M. Tekalp

The coe cients i , = 0 8, are estimated by using the least squares method which minimizes the error function N 1 X X X X 2 2 = ( c( 1 2 ) ; i i ( 1 2 )) jx1 =n1 x y =n2 x t=n3 t
;

n1 n2 n3

i=0

with respect to these coe cients. The summation is over a local neighborhood of the pixel. A typical case involves 50 pixels, 5x5 spatial windows in two consecutive frames. Once the coe cients i are estimated, image gradients can be found by simple di erentiation, c( 1 2 ) = +2 + + j =
a @s x x t @ sc x1 x2 t

( (

@x1

a1

a4 x1

a6 x2

a7 t

x1 =x2 =t=0
1 2

a1

&

@x2

@ sc x1 x2 t @t

) = ) =

a2

+2 +

a5 x2

a6 x1

+
1

a8 t

jx =x =t=0 =
a3

a2

a3

a7 x1

a8 x2

jx =x =t=0 =
2

Estimating the coe cients of the rst three basis polynomials is su cient to estimate the gradients. 124

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Adaptive Methods

Horn-Schunck algorithm imposes the optical ow and smoothness constraints globally on the entire image (or over the motion estimation window).
Frame k k+1

Background to be covered
(no region in the next frame matches this region)

Uncovered background
(no motion vector points into this region)

Smoothness constraint does hold in the direction perpendicular to an occlusion boundary. Several researchers proposed to impose the smoothness constraint along the boundaries but not perpendicular to the occlusion boundaries. These methods require the detection of moving object (occlusion) boundaries. 125

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 6 BLOCK-BASED METHODS


1. Phase-Correlation Method 2. Block-Matching Algorithms Full-Search Three-Step Algorithm Cross-Search Algorithm 3. Hierarchical Motion Estimation 4. Motion Estimation with Spatial Transformations Generalized Block-Matching Extension of Lucas-Kanade Method

126

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Block Translation Model


Assume frame k + 1 is a globally (at least on a block-by-block basis) shifted version of frame k

s(n1 n2 k + 1) = s(n1 + d1 n2 + d2 k)
1) To overcome the aperture problem, there must be su cient gray level variation within the block. 2) This model is used in many practical applications including World standards for video compression such as H.261 and MPEG Motion-compensated ltering in standards conversion, etc...

127

' &
and

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Phase Correlation Method


k k

The correlation between the frames k and k + 1 is given by c +1 (n1 n2) = s(n1 n2 k + 1) s(;n1 ;n2 k) Taking the Fourier transform of both sides C +1 (f1 f2) = S +1 (f1 f2 )S (f1 f2)
k k k k

Normalizing C

(f1 f2 ) by its magnitude ~ +1 (f1 f2) = S +1 (f1 f2 )S (f1 f2 ) C jS +1(f1 f2)S (f1 f2)j Given the motion model S +1 (f1 f2) = S (f1 f2 )e; 2 ( 1 1 + 2 2)
k k

+1

k k

k i

f d

f d

~ C

k k

+1

(f1 f2 ) = e; 2
j

(f1 d1 +f2 d2 )

c ~

k k

+1

(n1 n2) = (n1 ; d1 n2 ; d2 ) 128

'
i

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Implementation Issues
Range of Displacement Estimates/Block Size: Since the DFT is periodic by the block size (N1 N2 ),

&

The range of estimates is ;N =2 + 1 N =2] for N even. For example, to estimate displacements within a range -31,32], the block size should be at least 64 64. Boundary E ects: To obtain a perfect impulse with the DFT, the shift must be cyclic. Since things disapperaing at one end generally do not reappear at the other end, the impulses degenerate into peaks.
i i i

8 < ^= d d : d ;N
i i

if jd j N =2 N even or jd j (N otherwise:
i i i i

; 1)=2

N odd
i

129

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Comments on Phase Correlation


Multiple Moving Objects: Experiments indicate that multiple peaks are observed in such a case. An additional search is required to nd which peak belongs to which part of the image. Frame-to-Frame Intensity Changes: Shifts in the mean value or multiplication by a constant do not a ect the Fourier phase. The method is insensitive to such changes.

Extension to include rotation is possible (although costly).

130

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Block Matching Method

The displacement at the center of an N1 N2 block in frame k is determined by searching for the location of the best matching block of the same size in the frame k + 1. The search is limited to within a search window.
k+1 Frame k

Search window Block

Block matching algorithms di er in - Matching criteria (maximum cross-correlation, minimum error) - Search strategy - Determination of block size (hierarchical, adaptive) 131

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Matching Criteria
1

Minimum Mean Square Error (MSE)


MSE (d1 d2 ) = N N
1

X
(n1 n2 )2B

s(n1 + d1 n2 + d2 k + 1) ; s(n1 n2 k) ]2

where B denotes an N1 N2 block.

Minimum Mean Absolute Di erence (MAD)


MAD(d1 d2 ) = N N
1

X
(n1 n2 )2B

j s(n1 + d1 n2 + d2 k + 1) ; s(n1 n2 k) j
T T

^1 d ^2] = (d1 d2 ) which minimizes the MSE or . The displacement estimate is d MAD criterion. 132

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Search Procedures
Usually the search area is limited to

where M1 and M2 are predetermined integers.

;M1

d1 M1 and

;M2

d2 M2

Full Search: calls for the evaluation of the matching criterion at 2M1 + 1 2M2 + 1 distinct points for each block. Three-Step Search Cross-Search

133

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Three-Step (Logarithmic) Search


2 1 1 2 2 1 0 2 1 2 2 3 3 3 2 3 2 3 3 3 3 1

Illustration for M1 = M2 = 7. The number of steps depends on the maximum displacement vector allowed and the accuracy of estimation e.g., a range of 32 pixels with 0.5 pixel accuracy would require 6-steps ( 16, 8, 4, 2, 1, 0.5 pixels). 134

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Cross-Search

1 1 0 1

2 1 2

3 2 3 5 3 5 4 5 5 4

The distance between the search points is reduced if the best match is at the center of the cross or at the boundary of the search window.

135

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Comments on Block Matching

Minimizing the MSE or MAD criteria can be viewed as imposing the optical ow constraint on the entire block. It is assumed that all pixels belonging to a block have a single translation vector, which is a special case of the local smoothness constraint (same as in Lucas-Kanade method). Block size selection: There are con icting requirements on the size of the blocks. - The block size should be su ciently large. It is possible that a match may be established between blocks containing similar gray-level patterns which are unrelated in the motion sense. - The block size should be su ciently small. If the motion vector varies within a block, block matching cannot provide accurate estimates. 136

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Hierarchical Image Representation


A hierarchical representation of the image sequence is formed using a simple low pass ltering operation at each level.

Level 3 Increasing resolution Level 2

Level 1

Decimation at each layer is optional.

137

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Hierarchical Block Matching


Perform block matching at each level starting with the lowest resolution image (highest level). Interpolate the result and pass onto the next higher resolution image as an initial estimate. The lower resolution levels serve to determine a rough estimate of the displacement using larger blocks. The higher resolution levels serve to ne-tune the displacement vector estimate. At higher resolution levels, smaller window size can be used since we start with a relatively good initial estimate.

138

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Hierarchical Block Matching


k+1 Frame k

Typical Set of Parameters for 5-Level Hierarchical Block Matching


PARAMETERS AT LEVEL: Filter Size Max Displacement Block Size 1 10 31 128 2 10 15 64 3 5 7 64 4 5 3 28 5 3 1 12

139

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Hierarchical BM - An Example

The center of the search area in the second level (denoted by \0") denotes the estimate from the rst level.
2 1 1 2 2 1 0 2 1 2 1 1 1 2 3 3 3 2 3 2 3 3 3 3 1 2 1 2 2 0 1 2 2 1 2 2 2 1 1

Level 1 (higher resolution)

M = 7 (3-steps) for level 2 and M = 3 (2-steps) for level 1.


T T T

Level 2 (lower resolution)

The estimates in the 1st and 2nd levels are 7 1] and 3 1] , respectively, resulting in an estimate of 10 2] . 140

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Shortcomings of Block Matching

Translational motion (2-parameter).

frame k

frame k+1

cannot handle rotation or zooming. Accuracy is essential in motion-compensated ltering. discontinuity at block boundaries. Blocking artifacts in motion-compensated compression. 141

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Spatial Transformations

Consider block-based image warping by A ne motion model (6-parameter). Perspective or bilinear motion model (8-parameter).
Affine

Affine

Perspective Bilinear

Bilinear

142

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Motion Estimation with Spatial Transformations


Generalized block matching

{ Search method (Seferidis and Ghanbari) { Algebraic method (Extension of Lucas-Kanade method)
2-D mesh modeling (motion continuity across block boundaries)

{ Hexagonal search (Nakaya et al.) { Constrained linear estimation (Altunbasak and Tekalp)

143

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Generalized Block Matching


Texture mapping

Frame k-1

Frame k

Search for all combinations of the coordinates of the corners to minimize the SAD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
reference frame current frame

144

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Algebraic Method
Extension of the Lucas-Kanade method to parametric motion models. A ne Motion Model:

v1(x1 x2) = a1x1 + a2x2 + a3 v2(x1 x2) = a4x1 + a5x2 + a6 E=

(x1 x2 ) 2 B

Substitute v1 and v2 in the sum of errors in OFE over the block B

x2B

I 1 (x1 x2 )v1(x1 x2 ) + I 2 (x1 x2)v2(x1 x2) + I (x1 x2)]2


x x t

Di erentiate E with respect to a1 : : : a6 and set the results equal to zero to obtain six linear equations in six unknowns 145

'
2P P 6 P 6 6 P 6 4P P

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Extension of Lucas-Kanade Method (cont'd)

$
1

2 6 6 4
1 2 x2 Ix 1 Ix1 Ix2 x1 Ix1 Ix2 x2 Ix1 Ix2 2 Ix 1 2 x1 Ix

P P P P P P

a ^1 a ^2 a ^3 a ^4 a ^5 a ^6

3 7 7 5

x1 Ix1 Ix2 x2 1 Ix1 Ix2 x1 x2 Ix1 Ix2

1 2 x2 I 1 x1 2 x1 x2 Ix 1

2 x1 Ix

P P P P P P

1 2 x1 x2 Ix 1 2 x2 I x 2 1 x2 Ix1 Ix2 x1 x2 Ix1 Ix2 x2 2 Ix1 Ix2

2 x2 Ix

P P P P P P

2 Ix 2 2 x1 Ix

x2 Ix1 Ix2

Ix1 Ix2 x1 Ix1 Ix2

2 x2 Ix

&

P P P P P P 2 ;P P ;P 6 6 ; P 6 6 4 ;;P P
2 2

2 x1 Ix

x1 x2 Ix1 Ix2

x2 1 Ix1 Ix2

x1 Ix1 Ix2

2 2 x2 I 1 x2 2 x1 x2 Ix 2

x1 Ix1 It x2 Ix1 It x1 Ix2 It x2 Ix2 It Ix2 It

Ix1 It

3 7 7 7 7 5

P P P P P P

x2 2 Ix1 Ix2

x1 x2 Ix1 Ix2

x2 Ix1 Ix2

2 x2 Ix 2 x2 2 Ix

2 x1 x2 Ix 2

3; 7 7 7 7 5

146

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Two-D Mesh Modeling


Texture mapping

A ne motion with triangular patches. Hexagonal matching (search). Constrained Linear Estimation: All constraints are linear.

Frame k-1

Frame k

147

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Hexagonal Matching

There are six lines intersecting at each node in the case of a uniform triangular mesh. The boundaries of these six triangles de ne a hexagon. Perturb each node point to yield the smallest SAD within its hexagon. 148

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 7 PEL RECURSIVE METHODS


1. Minimization by Gradient Descent 2. Netravali-Robbins Algorithm 3. Walker-Rao Algorithm 4. Wiener-based Estimation

149

'
Let

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Displaced Frame Di erence (DFD)


d(x
t t

)=
:

d1

(x
:

d2

(x

)]T
t t t

denote the displacement eld at x = 1 2]T between frames and + The DFD function between these two frames is de ned as
x x d fd

^ ) = c(x + d ^ (x (x d
: s

) +
t

) ; c (x )
s t

&

where c ( ) denotes the spatio-temporal image intensity distribution. ^ take noninteger values, interpolation is required to If the components of d compute at each pixel location. ^ is equal to the true displacement vector and there is no interpolation If d errors, attains the value of zero at that pixel location.
s d fd d fd

150

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Relation between DFD and OFE


d fd

- Expanding the
s x

into Taylor series about (x ), for d(x) and


t x2 d t t s t d @s

small,
t

c( 1 + 1 (x)
d

+ 2 (x) + ) = c(x ) + 1 (x) c(x ) 1 c(x ) + + 2 (x) c(x ) +


@x d @s t @x2 t @s t @t h:o:t: d fd

- Neglecting

h:o:t:

, and setting
t d

^ ) = 0, we obtain (x d
t d t @s

@s

@ x1

c(x ) ^1(x) + c(x ) ^2 (x) +


@s @x2 t

c (x ) = 0
t @t

- Dividing both sides by


@s t @ x1 v

, and taking the limit


@s t @x2 v @s

!0
t :

c(x ) ^1(x) + c(x ) ^2(x) + c(x ) = 0


@t

151

'
. . . .

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Comments
d v t d v t

In the case of constant velocity motion, where and 2 (x) = 2 (x) . 1 (x) = 1 (x) the optical ow equation is satis ed when the displaced frame di erence function attains the value of zero. In practice, neither the dfd nor the error in the OFE is exactly zero, because there is observation noise, scene illumination may vary by time, there are occlusion regions, and there are interpolation errors.
d fd

&

Therefore, one aims to minimize the absolute value or the square of the or the LHS of the OFE to obtain an estimate of the frame-to-frame motion eld. 152

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

PEL-RECURSIVE ALGORITHMS
di+1 (x) = di (x) + ui (x)

Pel-recursive algorithms are of the general form where di (x) is the estimated motion vector at the pel location (x) in the th step, ui (x) is the update term in the th step, and di+1(x) is the new estimate.
i i

The update term ui (x) is estimated, at each pel x, to minimize a positive-de nite function of the with respect to d. The iterations may be executed at a single pel (pixel) position or at consecutive pel positions or a combination of both.
E d fd

The motion estimate at the previous pel is taken as the initial estimate at the next pel, hence pel-recursive. 153

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Minimization by Gradient Descent


A straightforward way to minimize a function is to set its derivatives to zero: where rd is the gradient operator with respect to d, the set of partial derivatives. The following equations must be solved simultaneously:
@E @d @E

rd

(x d) = 0

@d2

(x d) = 0 1 (x d) = 0

Since an analytical solution to these equations cannot be found in general, we resort to iterative methods. 154

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The gradient vector points AWAY from the minimum. That is, in one dimension, its sign will be positive on an \uphill" slope. Thus, to get closer to the minimum, we can update our current vector as
d(k+1) (x) = d(k) (x)

; rd

(x d)jd k (x)
( )

where is some positive scalar, known as the step size.


too small E(d)

too large d

&

(k)

min

If is too small, the iteration will take too long to converge, if it is too large the algorithm will become unstable and start oscillating about the minimum. 155

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Newton-Raphson Method

We can estimate a good value for using the well-known Newton-Raphson method for root nding d(k+1) (x) = d(k) (x) ; H;1 rd (x d)jd k (x) where H is the Hessian matrix 2 (x d) H =
E
( )

ij

@ E

@d @d

i j

&

In one dimension, we would like to nd a root of ( ). Expanding ( ) in a Taylor series about the point (k) ( (k+1) ) = ( (k) ) + ( (k+1) ; (k) ) ( (k) ) Since we want (k+1) to be a zero of , we set ( (k) ) + ( (k+1) ; (k) ) ( (k) ) = 0 Thus, (k) ) ( (k+1) (k) = ; ( (k ) )
E
0

00

E d

00

00

156

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Local vs. Global Minima


Gradient descent su ers from a serious problem: its solution is strongly dependent on the starting point. If start in a \valley", it will be stuck at the bottom of that valley. This may be a \local" minimum. We have no way of getting out of that local minimum to reach the \global" minimum. More sophisticated optimization methods, such as simulated annealing, are needed to be able to reach the global minimum regardless of the starting point. However, these more sophisticated optimization methods usually require a lot more processing time.

157

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Netravali-Robbins Algorithm

The Netravali-Robbins algorithm nds an estimate of the displacement vector at each pixel to minimize
E

(x d) =

d fd

(x d)]2

A steepest descent approach to the minimization problem yields the iteration


di+1 (x)

where r is the gradient with respect to d. Since

= di (x) ; (1 2) rd (x di )]2 = di (x) ; (x di ) rd (x di )


= d fd d fd d fd

rd

d fd

(x di) = rx c(x ; di
s

)
t

the estimate becomes

di+1 (x) = di (x)

d fd

(x di ) rx c(x ; di
s

158

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Walker and Rao Algorithm

Walker and Rao suggested the following step size 1 = jjr (x ; d i ; )jj2 x c
s t t

This is motivated by the update term should be large when j


d fd :

should be small when j

( )j is large and jr c( )j is small, and


s :

d fd :

( )j is small and jr c( )j is large.


s :

Ca ario and Rocca have added a bias term 2 to avoid division by zero in the areas of constant intensity = jjr (x ; di 1 ; )jj2 + 2 x c
s t t :

159

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

$
.

Extension to Multiple Pixel Support


If we assume that the displacement remains constant over a support containing several pixels, we can minimize the over the support opposed to on a pixel-by-pixel basis
d fd E M M

as

M (dM ) =

x2M

d fd

(x dM )]2

This results in the following estimator


+1 di = di M ; (1 2) M
=

rd

X
x2M
d fd

(x di )]2
M

+1 where di denotes the new displacement estimate over the entire support M

160

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Wiener-based Estimation Algorithm


M

Linear minimum mean square error (LMMSE) estimation of the update term ui based on a neighborhood of a pel. (Extension of the multiple pel version of Netravali-Robbins algorithm.) Linearization of the at the pels of the support (x(1) di ) = ;rT c(x(1) ; di ; )ui + (x(1) di) (x(2) di ) = ;rT c(x(2) ; di ; )ui + (x(2) di) . . . . = . .
d fd d fd d fd s s t t t t v v d fd

(x( ) di) =
N

;rT

c(x( ) ; di
N

)ui + (x( ) di)


v N

Expressing this set of equations as z = uM + v the LMMSE estimate of the update term is given by ^ M = T R; u v 1 + R; u 1];1 T R; v 1z ^ M denotes the update term for the entire support where u 161

'
and

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The solution requires the knowledge of the covariance matrices of both the update Ru and the linearization error Rv . 2 I and R = 2 I, Assuming that Ru = u v v
2 v T ^M = u + 2 ];1 T z u

2 i +1 v i T dM = dM + + 2 ];1 T z u

&

Note that the assumptions that are used to arrive at the simpli ed estimator are not in general true, e.g., the linearization error is not uncorrelated with the update term, and the updates and the linearization errors at each pixel are not uncorrelated with each other. However, experimental results indicate better performance than other pel-recursive estimators. 162

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Remarks on Pel-Recursive Methods


The \pel-recursive" nature of the algorithm can be considered as an implicit smoothness constraint. The e ectiveness of this constraint increases especially when a small number of iterations are performed at each pixel. The aperture problem also exists in pel-recursive algorithms. The update term is a vector along the direction of the gradient of the image intensity. Thus, no correction is performed in the direction perpendicular to the gradient vector. Pel-recursive algorithms can be applied hierarchically, using multi-resolution representation of images, for improved motion estimation.

163

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 8 BAYESIAN METHODS

1. Introduction to Markov Random Fields and Gibbs Distribution 2. Optimization Methods Simulated Annealing (SA) - Metropolis algorithm and Gibbs sampler Iterated conditional modes (ICM) Mean eld Annealing (MFA) 3. MAP Motion Estimation Basic Formulation Discontinuity Models Estimation Algorithms 164

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

MARKOV AND GIBBS RANDOM FIELDS


MRFs are extensions of 1-D causal Markov chains to 2-D. MRFs were traditionally speci ed by local conditional probabilities which limited their usage. Recently it has been shown that every MRF can be described by a Gibbs distribution - hence the Gibbs random eld (GRF). Bayesian estimation methods can be developed using GRFs as a priori signal models for complex image processing applications such as motion estimation and segmentation. Since Bayesian estimation requires global optimization of a cost function, we study a number of optimization methods including simulated annealing (SA), iterative conditional mode (ICM), and highest con dence rst (HCF).

165

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

De nitions

Let a random eld z = fz (x) x 2 g be speci ed over a lattice , and ! 2 denote a realization of the random eld z. The random eld z(x) can be continuous or discrete-valued, that is ! (x) 2 R or ! (x) 2 ; = f0 1 : : : L ; 1g, for all x 2 , respectively. A neighborhood system on . The set Nx denotes the neighborhood of the site x, and has the properties: (i) x 62 Nx , and (ii) xj 2 Nxi $ xi 2 Nxj , where xi and xj denote arbitrary sites in the lattice. (In words, x does not belong to its own set of neighbors, and if xj is a neighbor of xi , then xi is a neighbor of xj , and vice versa.) The neighborhood system over is then de ned as N = fNx x 2 g 166

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Examples of Neighborhood Systems

(a)

(b)

A clique C is de ned as C such that all pairs of sites in C are neighbors. Further, C denotes the set of all cliques.

167

'
and

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Markov Random Fields (MRF)


p(z) > 0 for all z = !, p(z(xi ) j z(xj ) 8xj 6= xi) = p(z(xi ) j z(xj ) xj 2 Nxi ):

The random eld z= f z(x) g is an MRF with respect to N if

&

(In words, the rst condition implies all realizations have non-zero pdf, while the second states that the conditional pdf at a particular site depends only on its neighborhood.) Di culties with MRF models: i) the joint pdf p(z) cannot be easily related to local properties, and ii) it is hard to determine when a set of functions p(z(xi) j z(xj ) xj 2 Nxi ) xi 2 , are valid conditional pdfs Geman and Geman]. 168

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Gibbs Random Fields (GRF)

A GRF with a neighborhood system N and the associated set of cliques C is characterized by the joint pdf discrete-valued X ;U (z=!)=T 1 (z ; !) p (z = ! ) = Q e

Q=

X
!

e;U (z=!)=T

continuous-valued

&

where

1 e;U (z)=T p(z) = Q

Q=

and U (z), the Gibbs potential (Gibbs energy) is de ned by X U (z) = VC (z(x) j x 2 C ):
C 2C

e;U (z)=T dz

169

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Example: Spatial smoothness constraint using GRF Let us use a 4-point neighborhood system and the 2-pixel cliques. Over a 4 4 lattice, there are a total of 24 such cliques. Let the 2-pixel clique potential be de ned as

8 <; VC (z(xi ) z(xj )) = : +


2 2 2 2

if z (xi) = z (xj ) otherwise

where is a positive number.


2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 1 2 1 1 2 1 2 2 1 2 1

&

24 two-pixel cliques (a)

V= -24 (b)

V = 24 (c)

Note that a lower potential means a higher probability. 170

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Equivalence of GRF and MRF


respect to N . The H-C theorem provides us with a simple and practical way to specify MRFs through the Gibbs potentials. In general, the MRF is speci ed in terms of local conditional pdfs. Note that, there is no general method to obtain the joint pdf of an MRF from the local conditional pdfs Besag]. The Gibbs distribution gives the joint pdf of z, which can be easily expressed in terms of the clique potentials which express the local interaction between pixels. They can be assigned arbitrarily.

Hammersley-Cli ord (H-C) Theorem: Let N be a neighborhood system. Then z (x) is an MRF with respect to N if and only if p(z) is a Gibbsian with

171

' &
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Obtaining Local Conditional pdfs from Gibbs Potentials


(z) p(z(xi ) j z(xj ) 8xj 6= xi) = p(z(x p j ) xj 6= xi ) = P p(z) p(z) 8 xi 2 z(xi )2;

e.g., used in the Gibbs sampler method for optimization The local conditional pdf is de ned as,

After some algebra,


1 ; 1 ;T p(z(xi ) z(xj ) xj = xi ) = Qxi e

8 6

Cjxi 2C VC (z(x)jx2C )

Q xi =

X
z(xi )2;

1 ;T

Cjxi 2C VC (z(x)jx2C )

172

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

OPTIMIZATION METHODS
Many estimation/segmentation problems require the minimization of an energy function E (d). We state the problem as ^ = mindE (d) E where d is some N -dimensional parameter vector. The value of d that results in the minimal E is denoted by ^ = arg (mind E (d)) d This minimization is exceedingly di cult for image processing applications due to both the dimensions of the vectors involved and the occurence of local minima because E (d) is usually nonconvex.

173

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Local vs. Global Minima

Gradient descent su ers from a serious problem: its solution is strongly dependent on the starting point. If start in a \valley", it will be stuck at the bottom of that valley. We have no way of getting out of that local minimum to reach the \global" minimum. Here we look at several optimization methods that are capable of nding the global optimum. A. Simulated annealing (stochastic relaxation)

Metropolis algorithm, Gibbs sampler (by Geman and Geman).

B. Iterative conditional mode (ICM) (by Besag) C. Mean eld annealing (MFA) (by Bilbro et al.)

174

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Simulated Annealing
Simulated annealing, sometimes refered to as stochastic relaxation, belongs to the class of Monte Carlo methods. It enables us to nd the global optimum of a nonconvex cost function of many variables. Here we describe two implementations, - the original formulation of Metropolis and - the Gibbs sampler proposed by Geman and Geman. The computational load of simulated annealing is usually signi cant especially when the number of elements in the unknown vector d and the number of values in the set ; are large.

175

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Metropolis Algorithm

We start at an arbitrary initial vector d. At each iteration cycle, all components of d are perturbed one by one by assigning each another value in the set ; randomly. Note that the order in which the components are perturbed is not important, as long as all components are perturbed in each iteration cycle. The change in the total energy, E , due to the perturbation is computed after each perturbation to determine whether this perturbation is accepted. A perturbation is accepted with probability P given by

8 < exp(; E=T ) if E > 0 P =: 1 if E 0

where T is the temperature parameter that controls the probability of our accepting positive changes in the energy. We always accept perturbations that lower the energy. The rationale behind accepting perturbations that increase the energy is to prevent the solution from settling in a local minimum. 176

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

If T is relatively big, the probability of accepting a positive energy change is higher than when T is small, given the same E . In the next iteration cycle, the temperature is lowered, and the components are revisited. The process continues until the temperature has been lowered to near zero. A temperature \schedule", expressing temperature as a function of the iteration number, is therefore an important component in the stochastic relaxation process. Geman and Geman proposed the following schedule

T = ln(k + 1)
where is a constant and k is the iteration cycle. This schedule is viewed as over conservative but guarantees a global minimum solution. Schedules that lower the temperature at a faster rate have been shown to work.

177

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Algorithm 1. Choose an initial value for d = d(0) . Set i = 0 and j = 1. 2. Perturb the j th component of d(i) to generate the vector d(i+1) . 3. Compute E = E (d(i+1) ) ; E (d(i) ). 4. Compute P from

8 < exp(; E=T ) if E > 0 P =: 1 if E 0

&

5. If P < 1, then draw a random number that is uniformly distributed between 0 and 1. If the number drawn is less than P accept the perturbation. 6. Set j = j + 1. If j N , go to 2. (N is the number of components of d). 7. Set i = i + 1 and j = 1. Reduce T according to a temperature schedule. If T > Tmin , go to 2. Otherwise terminate. 178

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Gibbs Sampler

In Gibbs sampling, instead of making random perturbations and then deciding whether to accept or reject this perturbation, the new value is \drawn from" the distribution of P (d) and is always accepted. First compute the conditional probability of the component d(xi ) to take each of the values in the set ; given the present values of its neighbors using

P (d(xi ) =

1 ; 1 ;T d(xj ) xj = xi ) = Qxi e

Cjxi 2C VC (d(x)jx2C )

&

Q xi =

X
2;

1 ;T

Cjxi 2C VC (d(x)jx2C )

Then, the new value of the component d(xi ) is drawn from this conditional probability distribution. 179

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

To clarify the meaning of \drawn from", suppose that the sample space ; = f0 1 2 and 3g, and it was found that

P (d(xi) = 0 j d(xj ) P (d(xi) = 1 j d(xj ) P (d(xi) = 2 j d(xj ) P (d(xi) = 3 j d(xj )

xj 6= xi) xj 6= xi) xj 6= xi) xj 6= xi)

= = = =

0:2 0:1 0:4 and 0:3

&

A uniform random number, R, between 0 and 1 is generated. If 0 R 0:2 then d(xi) = 0, if 0:2 < R 0:3 then d(xi ) = 1, if 0:3 < R 0:7 then d(xi ) = 2, and if 0:7 < R 1 then d(xi ) = 3. Properties of perturbations through Gibbs sampling: (i) for any initial estimate, updating using the Gibbs sampler yields an asymptotically Gibbsian distribution. This result can be used to simulate a Gibbs random eld with speci ed parameters. (ii) for a speci ed temperature schedule, the maximum of the Gibbs distribution will be reached. Although this property is signi cant for MAP estimation, the speci ed temperature schedule may be too slow for use in practice. 180

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Iterated Conditional Modes (ICM)


ICM, also referred to as the greedy algorithm, is motivated by a need to reduce the computational load produced by stochastic relaxation or Gibbs sampling. Here, the sites are again visited one-by-one in some cyclic fashion, except there is no temperature change involved. The temperature T is set to zero, T = 0 for all iterations. Therefore, ICM is also refered to as the \instant freezing" case of simulated annealing. Refering to the equation of acceptance probability in SA, ICM only allows perturbations that provide negative E , since T = 0 e ectively gives a zero probability for accepting positive energy changes. Notice that due to this, solutions from ICM is likely to get trapped in local minima, and there is no guarantee that a global minimum can be reached.

181

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

It can be shown that ICM converges to the solution that maximizes the local conditional probabilities

P (d(xi ) =

1 ; 1 ;T d(xj ) xj = xi ) = Qxi e

Cjxi 2C VC (d(x)jx2C )

Q xi =

X
2;

1 ;T

Cjxi 2C VC (d(x)jx2C )

&

at each site. Thus, ICM is usually implemented as in Gibbs sampling but by choosing the value at each site that gives the maximum local conditional probability. ICM provides a much faster convergence than SA. Also, when the initial solution is a resonable estimate from other means rather than completely random, ICM reaches an acceptable solution in relatively few iterations. ICM produces good results for several applications that include image restoration see Besag] and image segmentation see Pappas]. 182

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Mean Field Annealing (MFA)


Mean eld annealing (MFA) originates from the \mean eld approximation" idea in statistical mechanics. The main idea is that in describing the interaction between a pixel and its neighbors, we use the mean values of the neighboring pixels. Thus, MFA is an approximation to simulated annealing, and it enables replacing the random search with a deterministic gradient descent. The implementation of MFA is not unique. Details can be found in the references. In particular, Snyder] is a good tutorial on several optimization methods. Other references: Geman and Geman] discusses the GRF/MRF equivalence. Vigorous treatment of the statistical formulations can be found in Besag] and Spitzer]. 183

'
Let
1

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

BAYESIAN MOTION ESTIMATION

sk = fsk (x)g x 2 , denote the kth frame of video, d(x) = d (x) d (x)]T denote the displacement vector at site x, and d = fd (x)g and d = fd (x)g for x 2 , denote the lexicographic ordering of
1 2 1 2 2

&

the x1 and x2 components of the displacement eld from frame k ; 1 to k, respectively i.e., sk (x) = sk;1 (x ; d(x)): Then, the problem of motion estimation can be formulated as: given sk and sk;1 , nd an estimate of d1 and d2. The maximum a posteriori probability (MAP) estimates of d1 and d2 are given by: ^1 d ^ 2 ) = arg maxd1 d2 p(d1 d2jsk sk;1) (d 184

'
or

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

From Bayes formula

sk;1)p(d1 d2jsk;1) p(d1 d2jsk sk;1) = p(sk jd1 d2p( sk jsk;1)

Since the denominator is not a function of d1 and d2 ,

^1 d ^ 2 ) = arg maxd1 d2 p(sk jd1 d2 sk;1 )p(d1 d2jsk;1) (d ^1 d ^ 2) = arg maxd1 d2 p(sk;1 jd1 d2 sk )p(d1 d2jsk ) (d

&

The term p(sk jd1 d2 sk;1 ) is the conditional pdf, or the \consistency (likelihood) measure", that measures how well the estimates of d1 d2 explain the observations sk given sk;1 . The term p(d1 d2 jsk;1 ) is the a priori probability density that is modeled by a GRF, by specifying the clique potential functions according to the desired local ^1 d ^ 2). properties of (d 185

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Discontinuity Models

Let us introduce two auxilary elds, the occlusion eld o, and the line eld l to model the occlusion/uncovered areas, and the optical ow boundaries respectively, in order to improve the motion estimation results. The occlusion eld o = o(x) x 2 ,

8 < 0 d(x) is well de ned o(x) = : 1 x is an occlusion point


The line eld The a priori pdf p(d1 d2 jsk;1 ) is usually chosen to favor a globally smooth motion eld. To allow for the presence of discontinuities in the motion eld, we make use of the line process.

186

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The line eld l(xi xj ) models the horizontal and vertical discontinuities in the motion eld (optical ow) between the sites xi and xj as

8 > <1 l(xi xj ) = > :0

if there is a discontinuity between d(xi)and d(xj ) otherwise.

The line process, l conceptually occupies the dual lattice which has sites for lines between every pair of pixel sites. The state of each line site can be either ON (l = 1) or OFF (l = 0), expressing the presence and absence of a discontinuity, respectively. Nonnegative potentials are assigned to each rotation invariant line clique con guration to penalize excessive use of the \ON" state.

187

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Example: Line Process Clique Potentials

a)

b)

V = 0.0

V = 2.7

V = 0.9

&

V = 1.8

V = 1.8 c)

V = 2.7

An image with 4 4 pixel sites has 9 distinct 4-line cliques. 188

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Example: Prior probabalities with and without the line eld The prior potentials slightly penalize straight lines (V = 0:9), penalize corners (V = 1:8) and \T" junctions (V = 1:8), and heavily penalize end of a line (V = 2:7) and \crosses" (V = 2:7).
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 1

The likelihood potential function puts no penalty on dissimilar pixel pairs if the line site in between is ON, and puts di erent amounts of penalty on di erent line con gurations, re ecting our a priori expectation of their occurence.

189

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

With the introduction of the auxiliary elds, the MAP estimate of fd1 d2 o lg is given by: ^ fd ^ fd
1

^ o ^^ d lg = arg maxd1 d2 o l p(d d o ljsk sk; )


2 1 2 1

Using the Bayes rule, and the symmetry of the expression


1

^ o ^^ d lg = arg maxd1 d2 o l p(sk; jd d o l sk )p(d d o ljsk )


2 1 1 2 1 2

Next, we discuss the likelihood (consistency) and the a priori probability models.

190

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Likelihood Model


Assuming -the change in the illumination from frame to frame is insigni cant, and - that there is no occlusion/uncovered areas, the change in the intensity of a pixel along the motion trajectory is due to observation noise. Modeling the observation noise as white, Gaussian, we have

( X ) (x ; d(x))) p(sk; jd d l sk ) = C exp ; (sk (x) ; sk; 2


1 1 2 1 2

x2

where C is some constant.

191

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Now, taking the occlusion points into account

p(sk;1jd1 d2 o l sk ) = # " X 2 ; sk;1(x ; d(x))) C exp ; (1 ; o(x))(sk (x)2 2 x2


This pdf can be expressed more compactly in terms of an \energy function"

p(sk;1jd1 d2 o l sk ) = C exp ;U (sk jd1 d2 o sk;1 )]

&

X 1 U (sk jd1 d2 o sk;1) = 2 2 (1 ; o(x)) (sk (x) ; sk;1(x ; d(x)))2 x2

192

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Prior Model

The prior model incorporates the location of the optical ow boundaries and the occlusion/uncovered areas while dictating that the ow vectors vary smoothly within each optical ow boundary. The a priori model can be expressed as

p(d1 d2 o ljsk ) = exp ;U (d1 d2 o ljsk )] U (d1 d2 o ljsk ) = dU (d1 d2jl) + s U (ojl) + l U (ljo) X X X = d Vc(d1 d2jl) + s Vc(ojl) + l Vc(ljsk )

&

c2Cd

c2Co

c2Cl

Here Cd , Co and Cl denote the sets of all cliques for the displacement, occlusion and line elds, respectively, Vc (:) represent the corresponding clique function, and d , o and l are positive constants. 193

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

ESTIMATION ALGORITHMS
The minimization of the overall potential is an exceedingly di cult problem, there are several hundreds of thousands of unknowns for a reasonable size image, and the criterion function is nonconvex. For example, for a 256 256 image, there are 65,536 motion vectors (131,072 components), 65,536 occlusion labels, and 131,072 line eld labels for a total of 327,680 unknowns. An additional complication is that the motion vector components are continuous-valued, and the occlusion and line eld labels are discrete-valued.

194

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Three-step iteration of Dubois and Konrad: ^ and ^ 1. Given the best estimates of the auxilary eld o l, update the motion eld dk by minimizing ^ ^ min U ( g d d o g ) + U ( d d l gk ) g k k ; d d d1 d2
1 2 1 1 2

This minimization can be done by Gauss-Newton optimization.

^ 1, d ^ 2 and ^ 2. Given the best estimates of d l, update o by minimizing ^1 d ^ 2 o gk;1 ) + oUo (o ^ min U ( g d l gk ) g k o An exhaustive search or the ICM method can be employed to solve this step. ^ 1, d ^ 2 and o ^, update l by minimizing 3. Finally, given the best estimates of d ^1 d ^ 2 l gk;1 ) + o Uo (^ min U ( d o l gk ) + l Ul (l gk ) d d l Once all three elds are updated, the process is repeated until a suitable criterion of convergence is satis ed. This procedure has been reported to give good results. 195

&

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

LECTURE 9 MOTION SEGMENTATION


1. Basics of Segmentation - Thresholding / Clustering / MAP Segmentation 2. Foreground/Background Separation 3. Dominant Motion vs. Parametric Clustering Methods 4. Direct Methods vs. Optical Flow Segmentation 5. Simultaneous MAP Motion Estimation and Segmentation 6. Integration of Color and Motion Segmentation

196

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

WHY OBJECT/MOTION SEGMENTATION?


Help improve optical ow estimation with multiple motion Help improve 3-D motion and structure estimation Object-based video coding Object-based editing (synthetic trans guration)

197

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Image vs. Optical Flow Segmentation


e.g., image segmentation usually refers to segmentation based upon the grayscale (or color) of pixels.

Segmentation is based on a feature (vector).

Application of standard image segmentation methods directly to optical ow segmentation (i.e., using the velocity vector as feature) may not be useful, since 3-D motion usually generates spatially varying optical ow elds. e.g., within a purely rotating object, there is no ow at the center of rotation and the magnitude of the ow vectors increase as the distance of the points from the center of rotation increase.

Thus, optical ow segmentation needs to be based on some parametric description of the motion eld. 198

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

2-D Optical Flow Estimation and Segmentation


A realistic scene generally contains multiple motion. Smoothness constraints cannot be imposed across motion boundaries.
Background Calendar

Ball

Train

199

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

3-D Motion/Structure Estimation and Segmentation

Assume that the object surface is composed of planar patches. aX1 + bX2 + cX3 = 1 The 3-D rigid motion of the object is modeled as 2 3 2 3 X1 X1 6 7 6 7 6 7 6 = R 4 X2 5 4 X2 7 5+T X3 X3
0 0 0

Then,

&

2 3 2 32 3 X1 a1 a2 a3 X1 6 7 6 7 6 7 6 7 6 7 6 4 X2 5 = 4 a4 a5 a6 5 4 X2 7 5
0 0

where

X3
0

a7 a8 a9

X3

A = R+T a b c ]
200

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Scene Segmentation
x1 = a1x1 + a2x2 + a3 x2 = a4x1 + a5x2 + a6
0 0

Orthographic projection of the object coordinates into the image plane yields

Perspective projection of the object coordinates into the image plane yields 1 + a2 x2 + a3 x1 = aa1x 7 x1 + a8 x2 + 1 1 + a5 x2 + a6 x2 = aa4x 7 x1 + a8 x2 + 1
0 0

Assuming the scene is represented by a 3-D mesh (wireframe) model with planar patches, di erent parametric models are needed for { Di erent moving objects, which have di erent set of 3-D rigid motion parameters. { Di erent planar patches, which have di erent normal vectors. 201

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Thresholding

Consider a bi-modal histogram h(s) of an image, s(x1 x2 ), composed of a light object on a dark background.
h(s)

s s min T s max

To extract the object from the background select a threshold T that separates these two dominant modes (peaks) 8 < 1 if s(x1 x2 ) > T z (x 1 x 2 ) = : 0 otherwise. indicates the object and background pixels. 202

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Multilevel Thresholding If the histogram has M signi cant modes (peaks), where M > 2, then we need M ; 1 thresholds to separate the image into M segments. Of course, reliable determination of the thresholds becomes more di cult as the number of modes increases. Global/Local/Dynamic Thresholding In general, the threshold T is a function of

T = T (x1 x2 s(x1 x2) p(x1 x2))


where (x1 x2 ) are the coordinates of a point, s(x1 x2) is the intensity of the point, and p(x1 x2) is some local property of the point, such as the average intensity of a local neighborhood. If T depends only on s(x1 x2), it is called a global threshold. If T depends on both s(x1 x2 ) and p(x1 x2 ), it is a local threshold. If, in addition, it depends on (x1 x2 ), it is called a dynamic threshold. Methods for determining the threshold(s) are discussed in Gonzalez and Wintz. 203

&

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Clustering via the K-Means Algorithm

Suppose we wish to segment an image into K regions based on the gray-values of the pixels. Let x = (x1 x2 ) denote the coordinates of a pixel, and s(x) denote its grey level.
K = 2, M=1 s 1 2

The K-means method of clustering minimizes the performance index 3 2 K X X 6 J= 4 jjs(x) ; (i+1) jj27 5
k=1 x2
(i) k

204

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The K-means Algorithm:


(1) (1) 1. Choose K initial cluster centers, (1) 1 , 2 , ..., K . 2. At the i'th iteration distribute the pixels, x, among the K clusters using the relation

i) for all k = 1 2 : : : K , k 6= j , where ( k denotes the set of samples whose cluster i) center is ( k . i+1) 3. Compute the new cluster centers ( , k = 1 2 : : : K as the sample mean of k i) all samples in ( k X 1 (i+1) s(x) k = 1 2 : : : K =N k k

x2

(i)

if jjs(x) ;

(i)

jj < jjs(x) ; (ki) jj

&

x2

(i) k

i) where Nk is the number of samples in ( k . i+1) i) 4. If ( = ( k k for all k = 1 2 : : : K , the algorithm has converged, and the procedure is terminated. Otherwise, go to step 2.

205

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

MAP Segmentation
Clustering with Spatial Smoothness Constraints Let z(x) denote the segmentation label at the pixel x, i.e., 1 z (x) K , and s(x) denote the grey level of the pixel. De ne z and s to denote the lexicographic ordering of the segmentation label

eld and the grey level eld, respectively. The maximum a posteriori probability (MAP) estimate of the segmentation label eld maximizes the a posteriori probability of the segmentation labels given the pixel gray levels where p(s j z) is the conditional probability density of the image grey levels given the pixel labels and p(z) is the prior density of the segmentation labels. 206

p(zjs) / p(s j z)p(z)

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

&

The prior pdf of the segmentation labels is modeled by an GRF ( ) X X 1 p(z) = Q exp ; VC (z) (z ; !) ! C where Q is the partition function (normalizing constant) and the summation is over all cliques C. We consider only one and two point cliques. The single pixel clique potentials are de ned as VC (z(x)) = i if z(x) = i and x 2 C all i They re ect our a priori knowledge of the probabilities of di erent region types. The smaller i the higher the likelihood of region i. The two-point clique potentials are de ned as 8 <; if z (x1 ) = z(x2) and x1 x2 2 C VC (z(x1) z(x2)) = : if z(x1 ) 6= z(x2) and x1 x2 2 C where is a positive parameter so that two neighboring pixels are more likely to belong to the same class than to di erent classes. The larger the value of , the stronger the smoothness constraint.
2

The A Priori Probability Density

207

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The conditional density for region k is modeled as a white Gaussian process, with mean k and variance 2. Thus, the a posteriori density has the form ( ) X 1 X p(zjs) / exp ; 2 2 s(x) ; z(x) 2 ; VC (z)
x C

The Conditional Probability Density

Maximization of this a posteriori density function with respect to z can be performed by simulated annealing. Observe that if we turn o the spatial smoothness constraints, the result is identical to the K-means algorithm.

208

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Adaptive MAP Method


k

The MAP method can be made adaptive by letting the cluster means vary with the pixel location x. Then, ( ) X p(sjz) / exp ; (s(x) ; z(x) (x))2=2 2
x
2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 2 1 2 2 2 2 2 1 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 Segmentation labels, K=2 Local window

slowly

The quantities k (x) are estimated at each site x for all k = 1 : : : K , as the sample mean of those pixels with label k within a local window about the pixel x. 209

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

To reduce the computational load down to a reasonable level, 1) the space-varying mean estimates will be computed on a sparse grid, and then interpolated. 2) the optimization will be performed via the ICM method. The algorithm starts with a window size equal to the image size and reduce the size of the window by 4 after each ICM optimization cycle. The ICM is equivalent to maximizing the local a posteriori pdf

Computational Issues

p(z(xi )js(x) z(xj ) all xj 2 Nx ) 8 9 < 1 = X / exp :; 2 2 (s(x) ; z(x) (x))2 ; VC (z) Cx C
i
j 2

Ref: T. N. Pappas, \An Adaptive Clustering Algorithm for Image Segmentation," IEEE Trans. on Signal Proc., vol. SP-40, pp. 901-914, April 1992.

210

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Multi-Channel Segmentation
p(zjy) / p(yjz)p(z)

Let y(x) = (v1(x) v2(x) s(x)). Assign a single label z (x) to each element of y(x) to maximize Assuming v1, v2, and s are conditionally independent given z,

p(v1 v2 sjz) = p(v1jz)p(v2jz)p(sjz)


(
1

which results in

X 1 2 p(v1 v2 sjz) = exp ; 2 2 (v1(x) ; v z(x) (x)) + 1 x 1 (v (x) ; v (x))2 + 1 (s(x) ; 2 2 z(x) 2 2 2 2 3
2

s (x))2 z(x)

The prior pdf for s is a Gibbs distribution with a 4-pixel neighborhood system and 2-pixel cliques. 211

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

CHANGE DETECTION
FD(k k 1) (x1 x2) = s(x1 x2 k) ; s(x1 x2 k ; 1)
;

Compare two images pixel by pixel by forming a di erence image Segment the scene into moving vs. stationary parts by thresholding the di erence image 8 < 1 ifjFD(k k 1) (x1 x2 )j > T z(x1 x2) = : 0 otherwise.
;

where T is an appropriate threshold. This approach assumes that the illumination remains more or less constant from frame to frame. This method may result in isolated 1s in the segmentation mask z (x1 x2 ) due to noise in the images. 212

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Accumulative Di erences To eliminate sporadic \1"s in the segmentation mask, we may consider adding memory to the motion detection process by forming accumulative di erence images. Let s(x1 x2 k), s(x1 x2 k ; 1), , s(x1 x2 k ; n) be a sequence of images, and let s(x1 x2 k) be the reference frame. An accumulative di erence image is formed by comparing this reference image with every subsequent image in the sequence. A counter for each pixel location in the accumulative image is incremented every time the di erence between the reference image and the next image in the sequence at that pixel location is bigger than the threshold.

213

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

MOTION SEGMENTATION METHODS


Dominant motion approach (Diehl, Hotter and Thoma, Bergen et al., Burt et al., Irani et al.) Parameter clustering approach (Adiv, Wang and Adelson) Simultaneous Bayesian estimation and segmentation (Chang, et al.) Region-based approach using color information (Eren, et al.)

214

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

DOMINANT MOTION APPROACH


Compute the dominant 2-D translation in the entire region of analysis. Segment the region that corresponds to the computed motion by detecting \stationary pixels" in the registered images. Employ a higher-order (a ne, perspective) model within this region for improved motion estimation. Iterate steps 2-4 until convergence. Proceed to the next dominant object by excluding the support of previously computed dominant objects.

215

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

A Direct Method
i) Parametric modeling of the 2-D motion eld De ne a transform with a set of parameters that maps pixels from frame k to frame k+1. Estimate the parameters of this transform in the image domain. ii) Segmentation Regions undergoing the same 3-D motion would have the same set of mapping parameters. Thus, assign ow vectors having the same mapping parameters into the same class. The process iterates between parameter estimation and segmentation until a satisfactory result is obtained.

216

'
Let

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Parametric Modeling of the 2-D Motion Field


gk (x) = sk (x) + nk (x) gk+1 (x) = (1 + )sk+1(x ) + + nk+1 (x)
0

where and describe global illumination changes, and nk (x) denotes the noise. Assuming no occlusion e ects,

sk+1 (x ) = sk (x)
0

The transformation from the coordinate systems x to x is given by


0

x = h(x )
0

&

where

is a parameter vector. The form of h(x ) depends on:

1) The 3-D motion of the object. 2) The projection model from the 3-D space onto the camera plane. 3) The model of the object surface (planar, quadratice, etc.) 217

'
0

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Examples of Coordinate Transforms

1) Planar surface, perspective projection: Let x and x denote image plane coordinates under the perspective projection. Assume that the surface of the moving object is planar, X3 = aX1 + bX2 + c. Then, the transformation is given by 1 + a2 x2 + a3 x1 = aa1x 7 x1 + a8 x2 + 1 4 x1 + a5 x2 + a6 x2 = a a x +a x +1
0 0

7 1

8 2

where = (a1 a8) is the vector of mapping parameters. 2) Planar surface, orthographic projection: In the case of parallel (orthographic) projection, we have the a ne transform

&
where

x1 = c1x1 + c2x2 + c3 x2 = c4x1 + c5x2 + c6


0 0

= (c1

c6) is the vector of mapping parameters.


218

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

3) Quadratic surface, orthographic projection: Let the surface be characterized by


2 2 X3 = a11X1 + a12X1 X2 + a22X2 + a13X1 + a23X2 + a33

and the equations

x1 = mX1 x1 = m X1
0 0 0

x2 = mX2 x2 = m X2
0 0 0

describe the parallel projection. Substituting these into the 3-D displacement model and grouping terms with the same exponent, we arrive at the 12-parameter quadratic transform

&

2 x1 = a 1 x 2 1 + a2 x2 + a3 x1 x2 + a4 x1 + a5 x2 + a6 2 x2 = b1x2 1 + b2 x2 + b3 x1 x2 + b4 x1 + b5 x2 + b6
0 0

219

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Remarks: The quadratic transform is generally used in optical ow segmentation and object-oriented description, because it provides a good approximation to many real life images. It is not always possible to completely determine the 3-D motion of the object and the explicit surface structure using only the mapping parameters of the transform h(x ). But for image coding applications this does not pose a serious problem, since the main interest is the prediction of the next frame from the current frame. The mapping approach that is presented is not capable of handling occlusion e ects.

220

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Linear algorithms exist to nd the mapping parameters given spatio-temporal intensity gradients. The contents of the images sk (x) and sk+1(x) must be su ciently similar for estimation to be successful.

We estimate the mapping parameters to minimize the error function n o 1 2 J ( ^ ) = 2 E (~ sk+1(x ^ ) ; sk+1 (x)) where s ~k+1 (x ^ ) denotes the prediction of frame k + 1 from frame k.

Algorithms for Mapping Parameter Estimation

221

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Each object is characterized by a speci c mapping vector . Thus, segmentation and motion estimation are treated as a combined problem. - In the rst step, the regions which have changed between sk (x) and sk+1 (x) are determined (change detection). - All isolated connected-regions of the resulting segmentation are de ned as objects of hierarchy level one. For each of these objects, a parameter vector of a transform h(x ) which relates the two images is estimated. - Next, those regions of each object where the vector is not valid are removed. These regions are de ned as objects of the second hierarchical level. - For the objects of level two and the remaining parts of level one, the parameter vectors are estimated. - Repeat the procedure, until the parameter vectors for each region are consistent with the region. 222

Segmentation Based on Mapping Parameters

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

PARAMETER CLUSTERING APPROACH

Dense motion estimation (hierarchical, 3-step Lucas-Kanade) Start with randomly selected seed blocks (initial regions), estimate a ne parameters over each block. Merge regions with \similar" a ne parameters to reduce the number of classes. Update regions by classifying each pixel to one of the motion classes based on similarity of the dense and the corresponding a ne motion vectors, where a \good" match can be found. Reestimate a ne parameters over the updated regions, and iterate until convergence Classify all \unassigned pixels" based on a DFD criterion. 223

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Optical Flow Segmentation


Problem Statement: Segment a scene into independently moving objects. Feature Selection: - cannot use 2-D motion vectors since in most cases motion vectors do vary within a single 3-D moving object, e.g., rotation. - use the underlying 3-D motion parameters of the objects. An Application: Layered video representation

224

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

CLUSTERING METHODS
1. Estimate the optical ow eld. 2. Divide the motion eld into rectangular blocks.

3. For each block, estimate the a ne parameters by the method of linear least
squares. 4. Threshold the motion residual by Tstage to determine reliable blocks.

225

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

5. Apply the merge procedure to nd the a ne models to be used in pixel

assignment. 6. Find the pixels that fall into the computed cluster using the velocity checking criterion. 7. Delete all the assigned pixels from the image so that they will not be used in the next stage. 8. Eliminate small regions from the map obtained in step 7. 9. If all the pixels are assigned then stop, otherwise go to step 4.

226

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

MAP SEGMENTATION

Maximize the a posteriori pdf of the label eld

v2jz)p(z) p(zjv1 v2) = p(vp1(v 1 v2 ) given the optical ow data, where p(v1 v2jz) is the conditional pdf of the optical ow data given the segmentation and p(z) is the prior probability of the segmentation. 1) The segmentation eld is modeled by a spatio-temporal Markov random eld (MRF) to impose continuity (smoothness) of labels. 2) The conditional pdf models how well we can predict the measured (estimated) optical ow eld.
Ref. Murray and Buxton.

227

'
where

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Conditional Probability In the presence of noise n, the joint probability of the data given the segmentation labels is related to the noise distribution Pn (n) by
p(v1 v2jz) = Pn (n)
Assuming that the noise is white, Gaussian, with zero mean and variance 2, ) ( X 1 1 2 (x) exp ; Pn(n) = (2 2)1 =(2d( )) 2 2x
2

&

~ (x)jj2 (x) = jjv(x) ; v

which depends on the way the optic ow data are distributed among the various scene facets. 228

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The prior probability of the interpretation is modeled by an MRF with respect to some local neighborhood. Thus, it is given by a Gibbs distribution which e ectively introduces local constraints on the interpretation. X 1 p(z) = Q exp f;U (z)g (z ; !) !
2

The Prior Probability

where Q is the partition function X Q = exp f;U (!)g


!2

and U (!) is the sum of local potentials. Taking the logarithm of the MAP criterion, the maximization of the a posteriori probability distribution becomes minimization of the cost function 1 X 2(x) + U (z) 2 2x
2

229

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The Algorithm:
1. Start with an initial labeling z of the optical ow vectors. Calculate the mapping parameters a = a1 a8]T for each region using least squares tting. Set the initial temperature for SA. 2. Scan the pixel sites according to a prede ned convention. At each site xi: (a) Perturb the label zi , randomly. (b) Decide whether to accept or reject this perturbation, based on the change in the cost function X 1 2 VC (z(xi ) z(xj )) C = 2 2 (x) +
xj 2Nxi

&

3. After all pixel sites are visited once, re-estimate the mapping parameters for each region in the least squares sense based on the new segmentation label con guration. 4. Exit, if a stopping criterion is satis ed. Otherwise, lower the temperature according to the schedule, and go to step (2). 230

'
where and

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

The spatial and temporal continuity of the segmentation labels can be enforced by means of spatial and temporal Gibbs potential functions, where X X X X X U= V2s(z(xi ) z(xj ) Lij ) + V; (L) + V2t(z(xi ) z(xk ))
xi xj 2Nxi
;

Potential Functions for the Prior Model

xi xk 2Nxi

8 > > < ;as if z (xi ) = z (xj ) and Lij is OFF V2s(z(xi ) z(xj ) Lij ) = > as if z(xi) 6= z(xj ) and Lij is OFF > : 0 if Lij is ON 8 < ;at if z (xi ) = z (xk ) V2t(z(xi) z(xk )) = : at otherwise

&

Here as and at are positive parameters which control the strength of the spatial and temporal continuity constraints, respectively. 231

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Simultaneous Motion Estimation and Segmentation


The optical ow segmentation methods are limited by the accuracy of the available optical ow estimates. Combine motion estimation and segmentation under a single MAP estimation framework in a mutually bene cial way. The posterior probability

z)p(v1 v2jz gk )p(zjgk ) p(v1 v2 zjgk gk+1 ) = p(gk+1 jgk v1 vp2(g k+1 jgk )

p(gk+1 jgk v1 v2 z) is characterized by the DFD, modeled by a Gaussian distribution. p(zjgk ) is modeled as Gibbsian for connected regions.

232

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

p(v1 v2jz gk ) relates the 2-D motion estimates to the 3-D scene 1 exp f;U (v v jz)g p(v1 v2jz gk ) = p(v1 v2jz) = Q 1 2 where U (v1 v2jz) =
X X
xi xj 2Nxi

jjv(xi) ; v(xj )jj2 (z(xi) ; z(xj ))


X
x

~ (x)jj2 jjv(x) ; v

Maximizing the a posteriori pdf is equivalent to minimizing the cost function,

&

C = U (gk+1 j gk v1 v2 z) + U (v1 v2 j z) + U (z)

The minimization is performed in two steps, alternating between estimation of optical ow, estimation of the model parameters and update of segmentation labels. 233

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

1. Estimate the optical ow eld (v1 v2) assuming that the segmentation eld z is given. This step involves the minimization of a modi ed cost function X 2 X ~ (x)jj2 C1 = jjv(x) ; v v v (x) + x x X X + jjv(xi) ; v(xj )jj2 (z(xi) ; z(xj )):
1 2

xi xj 2Nxi

&

which is composed of all the terms in C that contain (v1 v2). While the rst term indicates how well v explains our observations, the second and third terms impose prior constraints on the motion estimates that they should conform with the parametric ow model, and that they should vary smoothly within each region. The algorithm is initialized with an optical ow eld that is estimated using a global smoothness constraint. Given this estimate, we initialize the segmentation labels using a procedure similar to Wang and Adelson. 234

'

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

2. Estimate the segmentation eld z, assuming the optical ow vectors (v1 v2 ) are given. This step involves the minimization of all terms in C that contain z as well as (v1 v2 ), the projection of the 3-D motion. The modi ed cost function is given by X 2 X C2 = jjv(x) ; v (x)jj2 v v (x) + x x X X + V2(z(xi ) z(xj )):
0 0

xi xj 2Nxi

&

The rst term quanti es how well the projected motion (v1 v2), which depends on z and , compensates for the motion. The second term measures the consistency of (v1 v2 ) with (v1 v2). The third term is related to the prior probability of the present con guration of the segmentation labels. This step includes the least squares estimation of the mapping parameters a. A hierarchical implementation of this algorithm is also possible by forming successive low-pass ltered versions of gk and gk+1 .
0 0 0 0

235

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Flowchart
Input video 2-D dense motion estimation
(e.g., Lucas-Kanade)

Update motion field given segmentation


(Chang, Tekalp, Sezan)

Multi-stage parametric motion segmentation


(ext. of Wang-Adelson)

Update segmentation given motion field


(Chang, Tekalp, Sezan)

Go to next frame

Updates are based on the MAP criterion using Gibbsian priors. 236

' &

Digital Video Processing

c 1995-98 Prof. A. M. Tekalp

Integration of Color and Motion Segmentation


4 1 A B 2 3

Perform pixel-based motion segmentation (dotted line) to determine the number of motion classes, and the parametric model for each class. Perform color segmentation to de ne regions bounded by edges (solid lines). Assign each color region into one of the motion classes based either on the motion criterion, DFD criterion, or a combination of them. 237

You might also like