Digital Video Processing

DIGITAL VIDEO PROCESSING
Department of Electrical Engineering, Hopeman 413 University of Rochester, Rochester, New York 14627 Ph: (716) 275-3774 FAX: (716) 473-0486 E-mail: tekalp@ee.rochester.edu The fundamentals of digital video representation, ltering and compression, including popular algorithms for 2-D and 3-D motion estimation, object tracking, frame rate conversion, deinterlacing, image enhancement, and the emerging international standards for image and video compression, with such applications as digital TV, web-based multimedia, videoconferencing, videophone and mobile image communications. Also included are more advanced image compression techniques such as entropy coding, subband coding and object-based coding.
EE 449 Spring 1997 A. Murat Tekalp
PART 1: REPRESENTATION
Lecture 1 Introduction to Analog and Digital Video Lecture 2 Time-Varying Image Formation Models Lecture 3 Spatio-Temporal Sampling Lecture 4 Sampling Structure Conversion Lecture 5 Optical Flow Methods Lecture 6 Block-Based Methods Lecture 7 Pel Recursive Methods Lecture 8 Bayesian Methods Lecture 9 Parametric Modeling and Motion Segmentation Lecture 10 2-D Motion Tracking Lecture 11 3-D Motion and Structure Estimation Lecture 12 Stereo Video
PART 2: MOTION ANALYSIS
PART 3: FILTERING
Lecture 13 Motion-Compensated Filtering Lecture 14 Standards Conversion Lecture 15 Noise Filtering Lecture 16 Restoration Lecture 17 Superresolution 1
PART 4: STILL-IMAGE COMPRESSION
Lecture 18 Fundamentals and Lossless Coding Lecture 19 DPCM and Transform Coding Lecture 20 Still Image Compression Standards Lecture 21 Subband/Wavelet Coding and Vector Quantization
PART 5: VIDEO COMPRESSION
Lecture 22 Interframe Compression Methods Lecture 23 Frame-Based Video Compression Standards Lecture 24 Object-Based Coding and MPEG-4 Lecture 25 Digital Video Communication
Textbook:
Digital Video Processing, by A. Murat Tekalp, Prentice-Hall, 1995.
Supplementary Reading:
mentals of analog and digital video systems, including HDTV, CATV, terrestial and satellite video broadcast technologies.) Video Dialtone Technology, by Minoli, McGraw Hill, 1995. (covers digital video over ADSL, HFC, FTTC and ATM technologies, including interactive TV and video-on-demand.)
Video Engineering, by Inglis and Luther, Second Ed., McGraw Hill, 1996. (covers funda-
Grading:
Homeworks 25% Midterm Project 25% Written report due Mar. 6 Final Project 50% To be presented May 6-8 Written report due May 12
Prerequisites:
EE 446 and EE 447 or EE 241 and permission of the instructor.
' &
Digital Video Processing
c 1995-97 Prof. A. M. Tekalp
LECTURE 1 INTRODUCTION TO DIGITAL VIDEO
1. Analog Video 2. Digital Video 3. Digital Video Standards 4. Digital Video Applications Digital TV PC Multimedia Real-time Communications
5. Digital Video Processing
c 1995-97 This material is the property of A. M. Tekalp. It is intended for use only as a teaching aid when teaching a regular semester or quarter based course at an academic institution using the textbook "Digital Video Processing" (ISBN 0-13-190075-7) by A. M. Tekalp. Any other use of this material is strictly prohibited.
' &
ANALOG VIDEO
One or more analog signals that contain time-varying 2-D intensity (monochrome or color) pattern and the timing information to align the pictures. Component Analog Video (CAV) - RGB - YCrCb (YIQ or YUV) Composite Video - NTSC (National Television Standards Committee) - PAL (Phase Alternating Line) - SECAM (SEquential Color And Memory) S-Video (Y/C video) - NTSC - PAL - SECAM 2
' &
Scanning and Frame-Rate
Frame rate and icker: Each complete picture is called a frame (temporal sampling). Minimum frame rate required for icker-free viewing is 50 Hz. Progressive scan: Each frame is made up of lines (vertical sampling).
A C B
C B A E
Raster scanning: a) progressive scan b) interlaced scan.
Interlaced scan, where each frame is split into two elds, provides a tradeo between temporal and vertical resolution.
' &
International TV Scanning Standards

Aspect Ratio 4:3 4:3 4:3 4:3 4:3 Interlace 2:1 2:1 2:1 2:1 2:1 Frames/s 29.97 25 25 25 25 Total/Active Lines 525/480 625/580 625/580 625/580 625/580 BW (MHz) 4.2 5.5 5.0 6.0 6.0
NTSC (USA,Japan,Can.,Mex.) PAL (Great Britain) PAL (Germany,Austria,Italy) PAL (China) SECAM (France,Russia)
Computer Scanning Standards

SVGA 640 480 640 480 1024 768 1280 1024 Color Mode 8bpp 24bpp 8bpp 4bpp Interlace No No No No Frames/s 60 70 70 70 Lines 525 525 800 1100 Lines/s 31,500 36,750 56,000 77,000 Data Rate (MB/s) 18.4 64.5 55.0 45.9
' &
Synchronization
Horizontal synch pulse 5
Scanning at the display device must be synchronized with that at the source.
Synch Black 100 75
White
12.5
Active line time 53.5
Horizontal retrace 10 t, s
NTSC video signal for one full line.
Blanking pulses are inserted during the retrace intervals to blank out retrace lines on the receiving CRT. Sync pulses are added on top of the blanking pulses to synchronize the receiver's horizontal and vertical sweep circuits. The timing of the sync pulses are di erent for interlaced and non-interlaced video.
' &
Resolution and Bandwidth

(F R)(N L)(HR) Video BW = 1 2
FR = Frame Rate NL = Number of Lines/Frame HR = Horizontal Resolution = fraction of time allocated to active video signal per line
Example: NTSC signal = 53.5 / 63.5 = 0.84 Video BW = 4.2 MHz Line Rate = (FR) (NL) = 29.97 525 = 15,734 2 4 :2 106 0:84 = 448 pixels HR = 15 734
' &
Spectral Content and Chrominance

v /H 2
F v /L 1
Spectrum of the scanned video signal for still images.

6 MHz sideband 4.2 MHz
1.25 picture carrier
4.83 color carrier
5.75 6 audio carrier
Spectrum of the NTSC video signal.
' &
Analog Video Acquisition
Electronic (CCD) video cameras - ITU-R standards 625/25 or 525/30 - recorded on video tape Motion picture cameras - 24 frames/s - recorded on motion picture lm Synthetic content - computer animation, graphics, etc. - formed by sequential ordering of a set of still-frame images
Analog Video Recording

Composite Video: VHS, U-matic Y/C Video: S-VHS CAV: Beta-cam 8
' &
DIGITAL REVOLUTION
Digital data communications (e.g., computer networks, e-mail) and Digital audio (e.g., CD players, digital telephony)
What is next?
Digital video - as a form of computer data Products such as: digital TV/HDTV, videophone, multimedia PCs, will be in the marketplace soon.
1] \Digital video," IEEE Spectrum Magazine, pp. 24-30, Mar. 1992.
' &
What is the bottleneck for Digital Video?
Let's look at the raw data rates for digital audio and video: CD quality digital audio High de nition video
! 44kHz sampling rate x 16bits/sample

approximately 700 kbps
! 1280 pels x 720 lines luma
640 pels x 360 lines chroma x 60 frames/s x 8 bits/pel/channel approximately 663.5 Mbps (from the GA-HDTV proposal)
A picture is worth 1000 words!!

Inglis and Luther, Video Engineering, McGraw Hill, pp. 160-178, 1996.
10
' &
Digital Video Studio Standards

ITU-R 601 ITU-R 601 CIF 525/60 625/50 NTSC PAL/SECAM 720 360 480 480 2:1 60 4:3 165.9 720 360 576 576 2:1 50 4:3 165.9 360 180 288 144 1:1 30 4:3 37.3
Number of active pels/line Lum (Y) Chroma (U,V) Number of active lines/pic Lum (Y) Chroma (U,V) Interlacing Temporal rate Aspect ratio Raw data rate (Mbps)
CIF: Common Intermediate Format 11
' &
Image/Video Compression Standards

CCITT G3/G4 JBIG JPEG H.261 H.263 H.263+ MPEG-1 MPEG-2 MPEG-4 binary images (non-adaptive) binary images still frame gray scale and color images ISDN applications (px64 kbps) PSTN applications (less than 64 kbps) low-bitrate PSTN applications (underway) optical storage media (1.5 Mbps) generic coding (4-20 Mbps) object-based functionalities (underway)
The boom in the FAX market followed binary image compression standards. 12
' &
Digital Video Exchange Standards

Intel Corp. Apple Computer Philips Consumer Electronics Eastman Kodak Company
DVI (Digital Video Interactive), Indeo Quicktime CD-I (Compact Disc Interactive) PhotoCD
A committee under the Society of Motion Picture and Television Engineers (SMPTE) is working to develop a universal header/descriptor that would make any digital video stream recognizable by any device. There are also digital recording standards, e.g., D1 (component video), D2 (composite video), etc. 13
' &
APPLICATIONS OF DIGITAL VIDEO
Consumer/Commercial All Digital HDTV @ 20 Mbits/s over 6 Mhz taboo channels Digital TV @ 4-6 Mbits/s Multi-media, desktop video @ 1.5 Mbits/s CD-ROM or harddisk storage Videoconferencing @ 384 kbits/s using p x 64 kbits/s ISDN channels Videophone and Mobile Image Communications @ 16 kbits/s using the copper network (POTS) Other
Surveillance Imaging (military or law enforcement) Intelligent Vehicle Highway Systems and Harbor Tra c Control Medical Imaging (cine imaging) Education and Scienti c Research
14
' &
Digital TV
Choices for ATV broadcast channels: - terrestial broadcast - direct satellite broadcast - optical ber cable broadcast Terrestial broadcast channels are 6 MHz in US and 8 MHz in Europe. A 6 MHz channel can support about 20-30 Mbps data rate using sophisticated modulation techniques (e.g., QAM or VSB).
{ To broadcast digital HDTV over a 6-MHz channel, we need about
663.5 : 20 = 34 : 1 compression. { A single 6-MHz TV channel can support 4 or 5 standard resolution digital TV programs (at 4-6 Mbits/s each).
1]\Digital television," IEEE Spectrum, April 1995.
15
' &
PC Multimedia
Early technologies
{ Compact Disc-Interactive (CD-I)
CD-based interactive full-screen, full-motion video { Digital Video Interactive (DVI) Technology Hardware to handle full motion video in PCs at about 1.5 Mbit/s. VideoCD and Digital Video Disk (DVD) Networked Multimedia / Video-on-Demand
1] \Special report: Interactive multimedia," IEEE Spectrum, pp. 22-39, Mar. 1993. 2] J. van der Meer, \The full motion system for CD-I," IEEE Trans. Cons. Electronics, vol. 38, no. 4, pp. 910-920, Nov. 1992. 3] J. Sutherland and L. Litteral, \Residential video services," IEEE Comm. Mag., pp. 37-41, July 1992.
16
' &
Real-Time Communications
Digital Audio: The audio signal is sampled at 8 kHz and quantized with Videoconferencing/videophone over ISDN: up to 2 Mbits/s using
H.261 or H.263 compression. H.263+ compression.
8-12 bits/sample. Most telephony networks is capable to carry a load of 14 kbps to 56 kbps. Bit rate reduction is achieved by coarser quantization.
Videophone over existing phone lines: 8 - 32 kbits/s using H.263 or Video communications over future broadband ATM/access networks: { Constant Bit Rate (CBR) channel - switched network { Variable Bit Rate (VBR) channel - quality of service contract { Available Bit Rate (ABR) channel - no guarantees, just like internet
17
' &
Packet Video
The video bitstream is divided into elementary blocks ( xed or variable size) each containing a header and payload (data bits), e.g., MPEG-2 packets. Packet video allows - interleaving video, audio, and data packets, and multiple programs in a single bitstream - better error protection and resilience, and low delay Network infrastructures { Telephone networks { CableTV networks { Internet (network of networks) Modes of transmission { Point-to-point transmission { Multi-casting and Broadcasting 18
' &
Access Networks
Fiber-to-Home Hybrid-Fiber-Coax (Cable Modem) Fiber-to-Curb (ADSL to home) Some Access Network Bit-Rate Regimes Conventional Telephone Modem ISDN (Integrated Services Digital Network) T-1 ADSL (Asymmetric Digital Subscriber Line) Cable Modem Ethernet (packet-based LAN) Fiber B-ISDN/ATM 19 28.8 kbps 64 - 144 kbps (px64) 1.5 Mbps 1.5-6 Mbps downstream 30 Mbps downstream 10 Mbps 55-200 Mbps
' &
Available Videoconferencing Products

Vendor BT North America GPT Video Systems Compres. Labs. NEC America PictureTel Corp. Video Telecom Name Videocodec VC2200 Videocodec VC2100 System 261 Twin chan. System 261 Universal Rembrandt II/VP VisualLink 5000 M20 VisualLink 5000 M15 System 4000 CS350 Codec speed 56 and 112 kbit/s 56 kbit/s to 2048 kbit/s 56 and 112 kbit/s 56 kbit/s to 2048 kbit/s 56 kbit/s to 2048 kbit/s 56 kbit/s to 384 kbit/s 56 kbit/s to 2048 kbit/s 56 kbit/s to 768 kbit/s 56 kbit/s to 768 kbit/s Max Frame 30 per sec 30 per sec 30 per sec 30 per sec 10 per sec 15 per sec Comp. Alg. H.261 H.261 H.261, CTX CTX Plus H.261, NEC proprietary H.261, SG3 SG2/HVQ H.261, Blue Chip Price $42,000 $42,000 $31,500 $35,000 $19,900 mono $34,950 color
20
' &
Available Videophone Products

Product AT&T Videophone 2500 Compression Alg. MC DCT 10 frames/s (max) British Telecom/Marconi H.261 like Relate 2000 Videophone 7.5 (3.75) frames/s COMTECH Labs. MC DCT STU-3 Secure Videophone QCIF resolution Sharevision 14.4 kbit/s MC DCT Data Rate 16.8/19.2 kbit/s 9.6/14.4 kbit/s 9.6 kbits/s Price $995 $1,275 (pair) under $1,000. $4,000 (pair)
21
' &
Comparison of Analog and Digital Video Systems

Digital representation is robust: Error correction minimizes the e ect of transmission/storage media distortion, noise and other degradations. Digital video can be transmitted with lower bandwidth than analog video of equivalent subjective quality by using digital compression. Digital video enables integration of networked PC multimedia, broadcast TV, and real-time communications (videophone and videoconferencing) in a uni ed system architecture. Digital video provides exibility for signal processing for enhancement, standards conversion, composition, special e ects, nonlinear editing, etc.
22
' &
Challenges in Digital Video Processing
(i) Motion Analysis 2-D motion/optical- ow estimation and segmentation 3-D motion, structure estimation and segmentation Object tracking, occlusion, deformations (ii) Filtering and Standards Conversion Deblurring, noise ltering, edge sharpening Frame rate conversion and deinterlacing Resolution enhancement (iii) Compression JPEG, H.261/H.263, MPEG 1-2 Subband/wavelet and model-based coding 23
' &
Di erences Between Still-Frame and Video Processing

Some tasks, such as motion estimation or the analysis of a time-varying scene cannot be performed on the basis of a single image. Utilization of temporal redundancies that naturally exist in an image sequence to develop e ective algorithms. - Motion-compensated ltering - Motion-compensated prediction
24
' &
LECTURE 10 2-D MOTION TRACKING

1. Token Tracking 2. Boundary Tracking 3. Object Tracking Single-Object Tracking Multiple-Object Tracking 4. Object-Based Representation (Layering, Alpha-Plane, Mosaicing, etc.)
238
' &
TOKEN TRACKING
2-D Trajectory Model: Describe temporal evolution of selected feature points, e.g.,
x1 (k + 1) = x1 (k) cos (k) ; x2 (k) sin (k) + t1 (k) x2 (k + 1) = x1 (k) sin (k) + x2 (k) cos (k) + t2 (k)
with a 2-D rotation by the angle (k) and translation by t1(k) and t2(k).
Observation Model: Determine a number of feature correspondences over multiple frames, e.g., by block matching. Batch or Recursive Estimation: Find the best motion parameters consistent with the model and observations. Batch estimators, e.g., the nonlinear least squares estimator, process the entire data record at once after all data is collected. Recursive estimators, e.g., Kalman lters, process each observation as it becomes available to update the motion parameters.
239
' &
Example: Tracking 2-D line segments
Each line segment is represented by a 4-D feature vector p = p1 p2]T consisting of the two end points, p1 and p2 . The 2-D trajectory of the endpoints modeled by 1 a(k ; 1)( t)2 x(k) = x(k ; 1) + v(k ; 1) t + 2 v(k) = a(k ; 1) t a(k) = a(k ; 1) where x(k), v(k), and a(k) denote the position, velocity, and acceleration of the pixel at time k, respectively (constant acceleration model). To perform tracking by a Kalman lter, we de ne the 12-dimensional state of the line segment as h iT z(k) = p(k) p _ (k) p(k) _ (k) and p(k) denote the velocity and the acceleration of the where p coordinates, respectively. 240
' &
Example: (cont'd)
The state propagation equation z(k) = (k k ; 1)z(k ; 1) + w(k) k = 1 : : : N where 2 3 1 2 I 4 I4 t 2 I4 ( t) 6 7 (k k ; 1) = 6 I4 t 7 4 04 I4 5 and I4 and 04 are 4 4 identity and zero matrices, respectively, w(k) is a zero-mean, white process with the covariance matrix Q(k). The observation equation y(k) = p(k) + v(k) k = 1 : : : N It is assumed that the noisy observations can be estimated from pairs of frames using some token-matching algorithm. 241
04 04
I4
' &
BOUNDARY TRACKING
Polygon tracking (by tracking corners) Splines and active contours -Propagate joint points by their motion vectors -De ne various energy functions to snap the propagated snake to the contour in the next frame.
242
' &
OBJECT TRACKING
Object-Based Editing ! Synthetic Trans guration Object-Based Coding ! MPEG-4 Content-Based Retrieval ! Digital Libraries 3-D Object Modeling ! Virtual Reality
243
' &
Triangle-Based A ne MC
Standard translational block matching cannot handle rotation and zooming. Neighboring relationships in the reference frame are preserved in the target frame. (Mesh elements do not overlap each other.)
Texture mapping
Frame k-1
Frame k
244
' &
SINGLE OBJECT TRACKING

2-D mesh based region tracking (rather than token or boundary tracking) Projection of the mesh from frame to frame (no temporal dynamic model) - Mild deformations 2-D mesh design (regular, adaptive, or content-based) - Object boundaries known Closed-form solutions and fast search for node motion re nement Compensation of additive and multiplicative illumination di erences
245
' &
2-D Mesh Design

Regular Mesh Simple, no need to store node locations as part of the syntax. Boundaries may not align with gray-level or motion edges. Adaptive Mesh Split-merge re nement of a regular mesh to align triangles with edges. Split instructions can be easily incorporated into the syntax. Content-Based Mesh Mesh optimized according to image content. Costly, all node locations need to be stored/transmitted.
246
' &
Content-Based Mesh Design
Node-point selection Delauney triangulation

The sum of DFD within each circle is the same
0000 1111 0000 1111 11111 00000 0000 1111 00000 11111 0000 1111 0000000 1111111 00000 11111 0000000 1111111 00000 11111 0000000 1111111 00000 11111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 high temporal 000000 111111 0000000 1111111 00 11 activity 000000 111111 00000 0011111 11 000000 111111 00000 11111 00 11 00000000 11111111 000000 11111 111111 00000 00000000 11111111 00000 11111 00000000 11111111 00000000 11111111 low temporal 00000 11111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 activity 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 00000000 11111111
Marked Pixels Unmarked Pixels
247
' &
Node-Point Selection
Estimate 2-D forward dense motion nd and polygonize the BTBC region. Label all pixels within the BTBC polygon \marked," and include its corners in the list of node points. Compute the average DFD over the unmarked region. Compute a cost function C (x y ) over the unmarked region. Select the unmarked pixel with the highest C (x y) which is not closer to any of the existing node points by a prespeci ed distance as the next node point. Grow a region about this node point until the sum of the absolute DFD reaches a threshold. Label all points within this region as \marked." Continue until the maximum number of node points is reached, or all pixels are \marked." 248
' &
Node-Point Motion Estimation
Sampling from dense motion eld Logarithmic hexagonal search (Hierarchical) Closed-form connectivity-preserving solutions
{ Node-based (Polygon Matching) { Patch-based
249
' &
Closed-Form Polygon Matching

All N sets of a ne parameters should yield the same motion vector at the center node. A ne parameters of two neighboring patches should yield the same motion vectors along their common boundary (line segment). Given at least N + 1 correspondences within the hexagon, a linear least squares solution can be found to determine all N sets of a ne parameters. Given the spatio-temporal intensity gradients, a linear solution can be found by constrained minimization (Lagrange optimization).
250
' &
An Example: 2-D Mesh Fitting
Select a polygon enclosing the region of interest Overlay a 2-D mesh (e.g., a uniform triangular mesh)
251
' &
Motion Estimation at the Boundary Nodes
...
..
Reference Frame
Previous Frame
Current Frame
Assumption: Mild deformations De ne a cost polygon about each boundary node Estimate the motion vector using deformable block matching 252
' &
Mesh Propagation and Re nement

A1
b a c
A1 A2 A2
a
A2
Previous Polygon Current Polygon
Propagate each node using the a ne mapping of the corresponding patch Use hexagonal matching to re ne the location of each node
253
' &
Hierarchical Mesh Re nement
254
' &
Tracking Intensity Variations

Intensity Model:
Ix = I R + c
scale factor c intensity o set Each node point is assigned a pair of parameters and c Values of and c at any x are bilinearly interpolated
255
' &

Input video Select a polygon bounding the ROI
Mesh fitting
Corner tracking Go to next frame
Mesh propagation and refinement
Modified mesh Reference still image Image synthesis
Synthesized video
256
' &
MULTIPLE OBJECT TRACKING

Occlusion-adaptive mesh modeling and design Motion estimation around object boundaries Interactions of multiple objects Temporary occlusions of objects Birth and death of objects
257
' &
Frame-Based Occlusion-Adaptive Mesh Tracking

Node-to-be-split New node
11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111
Frame k
111111111111 000000000000 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111
Frame k+1
BTBC
UB (Mesh refinement within the UB)
No node points within the BTBC region Mesh propagation with node point motion vectors Model failure detection (ideally, MF region = UB region) Mesh re nement within the MF region 258
' &
Motion Estimation Around Object Boundaries

Nodes with two motion New nodes
00000000000 11111111111 11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 BTBC 00000000000 11111111111 UB 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 11111 00000 0000 1111 000 111 00000000000 11111111111 00000000000 11111111111 00000 11111 0000 1111 111 000 00000000000 11111111111 00000000000 11111 11111111111 00000 0000 1111 000 111 00000000000 11111111111 00000 11111 0000 1111 000 111 00000 11111 0000 1111 000 111
Frame k Frame k+1
Use mesh elements from one object at a time only More than one motion vector for some nodes on the boundary BTBC regions should map onto a curve segment in the next frame. 259
' &
VOP-Based Object Tracking
Each object is tracked independently. Uncovered areas are either assigned to one of the existing objects, or to a new object. Object mosaicing.
260
' &
LECTURE 2
TIME-VARYING IMAGE FORMATION MODELS

1. Video Source Model 2. Modeling 3-D Rigid Motion 3-D Translation, Rotation, and Scale Characterization of the Rotation Matrix 3. Homogeneous Coordinates 4. Camera Models and Image Formation Projective Camera ! Perspective Projection A ne Camera ! Weak-Perspective and Orthographic Projection Photometric Image Formation
25
' &
VIDEO SOURCE MODEL
shot 1
shot N
A video source is a collection of shots. A shot is a video clip recorded by an uninterrupted motion of a single camera. Shot boundaries can be clean (as in a camera break) or blurred into a few frames as in special e ects such as dissolves, wipes, fade-ins, and fade-outs.
26
' &
Source Modeling of a Video Shot

Observation Noise
3-D Scene Modeling
Image Formation
Spatio-Temporal Sampling
Representation of digital video.
The variation in the intensity of the images from frame to frame is due to 3-D camera motion, e.g., zoom and pan, etc. 3-D object motion, e.g., local translation and rotation, photometric e ects of 3-D motion change in the scene illumination We neglect deformable body motion at this time. 27
' &
MODELING 3-D RIGID MOTION
time t k
time t k+1
Three-D displacement of a point on a rigid object - in the Cartesian coordinates, ( 1 an a ne transformation - in the homogeneous coordinates, ( a linear transformation
X X2 X3
), ),
kX1 kX2 kX3 k
Three-D velocity of a point on a rigid object 28
'
where
Modeling 3-D Displacement in the Cartesian Coordinates
3-D rotation, translation and scaling (zooming) of a rigid body can be represented by an a ne transformation
X = SRX + T
0 0 0 0
2 6 =6 4
X1 X2 X3
3 7 7 5 and
2 6 X=6 4
X1 X2 X3 t
3 7 7 5
t
&
denote the coordinates of a point at time instants k+1 and k , respectively, 2 3 2 3 0 0 1 1 6 7 6 7 6 7 6 T = 4 2 5 and S = 4 0 2 0 7 5 0 0 3 3
T T T S S S tk
are the translation vector between
and
tk+1
and scaling matrix, respectively.
29
'
Rotation: Eulerian angles in Cartesian coordinates: An arbitrary rotation in the 3-D space can be represented by the Eulerian angles , and of rotation about the X1 , X2 and X3 axes, respectively.
X 2
(0,1,0)
= 90 X
(1,0,0)
= 90 1
(0,0,1)
= 90
&
Eulerian angles of rotation.
30
'
and
The matrices that describe clockwise rotations about individual axes are given by 2 3 2 3 1 0 0 cos 0 sin 7 6 7 6 7 6 R =6 R = 1 0 7 4 0 cos ; sin 5 4 0 5 0 sin cos ; sin 0 cos
2 cos 6 =6 4 sin 0
; sin
cos 0
3 0 7 07 5 1
&
An Example: Consider rotation around the X1 axis by 90 degrees 2 3 2 32 3 2 3 X1 1 0 0 0 0 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 4 X2 5 = 4 0 cos 2 ; sin 2 5 4 1 5 = 4 0 7 5 X3 0 sin 2 cos 2 0 1
0 0 0
Recall that matrix multiplication is not commutative thus, in composite rotations, the order of specifying the rotations is important. 31
'
and
Assuming in nitesmall rotation from frame to frame, i.e., = , etc., and approximating cos 1 and sin , etc., these matrices simplify as 2 3 2 3 1 0 0 1 0 7 6 7 6 7 6 R =6 R = 40 1 ; 5 4 0 1 0 7 5 0 1 ; 0 1
2 1 6 =6 4 0
1 0
3 0 7 07 5 1 3 7 7 5
&
Then, the composite rotation matrix R is given by: 2 1 ; 6 R=R R R =6 1 ; 4
32
'
Rotation about an arbitrary axis in Cartesian coordinates: A 3-D rotation can be represented by an angle about an axis, described by the directional cosines 1 , 2 and 3 , through the origin.
n n n
(n , n , n ) 1 2 3
&
Rotation about an arbitrary axis.
33
'
Then,
$
3 7 5
2 R = 6 4
+ (1 ; ) (1 ; ) + n1 n3 (1 ; cos ) ; n2 sin
n2 1 n1 n2 n2 1 cos cos n3 sin
(1 ; cos ) ; n3 sin 2 n2 2 + (1 ; n2 )cos n2 n3 (1 ; cos ) + n1 sin

n1 n2
(1 ; cos ) + n2 sin n2 n3 (1 ; cos ) ; n1 sin 2 n2 3 + (1 ; n3 )cos

n1 n3
For an in nitesmall solid angle 2 6 R = 6 4 and we have
, R reduces to 1
;3
n n1
;2
n
n3
;1
n
n2
3 7 7 5
&
= = =
n1 n2 n3
, , .
34
'
lim
Three-D Velocity Model

0 0 0
$
T1 t T2 t T3 t
Start with the 3-D displacement model for rotation and translation only, 2 3 2 32 3 2 3 X1 7 6 1 ; X1 7 6 T1 7 6 7 6 6 1 ; 7 4 X2 7 5=6 4 56 4 X2 7 5+6 4 T2 7 5 X3 ; 1 X3 T3
t!
2 X0 1 6 0 X2 06 4 0 X3
X1 t ;X2 t ;X3 t
3 7 7 5 = lim
t!
2 0 6 06 4 t
;
t
0
t
32 3 t 7 6 X1 7 ; t 7 56 4 X2 7 5 + lim 0 X3
t!
2 6 06 4
3 7 7 5
&
32 3 2 3 _1 X ; 3 2 7 6 X1 7 6 V1 7 _2 X 0 ; 1 7 56 4 X2 7 5+6 4 V2 7 5 _3 X ; 2 1 0 X3 V3 where i and Vi denote the angular and translational velocities respectively, for i = 1 2 3.
2 6 6 4
3 2 0 7 6 7 5=6 4 3
35
' &
HOMOGENEOUS COORDINATES
De ne the vectors X and X0 in the homogeneous coordinates as 2 3 2 0 3 kX1 kX1 6 7 6 0 7 6 7 6 kX 0 kX 2 2 7 6 7 6 Xh = 6 and Xh = 6 0 7 7 4 kX3 5 4 kX 7 5
Then, the a ne transformation in the Cartesian coordinates X0 = AX + T can be expressed as a linear transformation in the homogeneous coordinates ~ h X0h = AX where 2 3 a11 a12 a13 T1 6 7 6 a a a T 21 22 23 2 7 ~ =6 7 A 6 4 a31 a32 a33 T3 7 5 0 0 0 1
36
'
where where
Digital Video Processing ~ X0 = TX

h
Translation:
h
2 1 0 0 T1 6 6 0 1 0 T2 ~ =6 T 6 6 4 0 0 1 T3 0 0 0 1
3 7 7 7 7 7 5
Scaling (Zooming):
~X X0 = S
h
&
2 3 S1 0 0 0 6 7 6 7 0 S2 0 0 6 7 ~=6 S 7 6 4 0 0 S3 0 7 5 0 0 0 1
37
'
where
Rotation:
~ X0 = RX
h
2 6 6 ~ =6 R 6 6 4
r11 r21 r31
r12 r22 r32
r13 r23 r33
3 0 7 07 7 7 07 5 1
&
rij
denotes the elements of the rotation matrix in the Cartesian coordinates.
38
'
where
GEOMETRIC IMAGE FORMATION

Imaging systems capture 2-D projections of a time-varying 3-D scene. The projection can be represented by a mapping
f
X1 X2 X3 t
)!(
x1 x2 t
X1 X2 X3 x1 x2
, and are continuous variables.

t
We consider two classes of camera models
&
- Projective Camera ! Perspective (Central) Projection - A ne Camera ! Weak-Perspective and Orthographic Projection
39
' &
Projective Camera
There are three coordinate systems - camera, image, and world. 1. Camera Coordinate System: Perspective Projection
Y c Xc y c (x ,y ) 0 0 xc
Z c
The center of projection coincides with the origin of the camera coordinates.
Using similar triangles

xc f
Xc Zc
and
yc f
Yc Zc
40
'
where
Perspective projection is nonlinear in the Cartesian coordinates however, it can be expressed as a linear operation in the homogeneous coordinates.
2 3 2 3 2 xc 7 6 Xc 7 6 1 0 0 0 6 6 4 yc 7 5= 6 4 Yc 7 5= 6 40 1 0 0 f Zc 0 0 1 0
= f=Zc
3 2 Xc 3 6 7 7 6 Y c 7 7 6 7 56 4 Zc 7 5 1
&
41
'
2. Image Coordinate System: Intrinsic Camera Parameters

kx xc ky y
= c =
xi
xi yi
; ;
x0 y0
y y i
c x c
x ,y 0 0
The units of k is pixels/length. No shear between camera axes.
&
where C is called the camera calibration matrix, and the principle point ( is where the optic axis intersects the image plane. 42
2 3 2 32 3 2 3 xi 7 6 fkx 0 x0 7 6 xc 7 xc 7 6 6 f6 4 yi 7 5=6 4 0 ;fky y0 7 56 4 yc 7 5 = C6 4 yc 7 5 1 0 0 1 f f
x0 y0
'
Y c
3. World Coordinate System: Extrinsic Camera Parameters

Xc (x ,y ) 0 0
Z c
Zw
2 3 Xc " 6 7 6 7 Yc 7 6 = 6 7 4 Zc 5 1
Yw
R 0T
2 3 # 6 Xw 7 t 6 Yw 7 6 7 6 1 4 Zw 7 5 1
R, t
From world coordinates to pixels: 2 2 3 2 3 Xw " # 1 0 0 07 xi 7 6 6 6 6 R t Yw 6 7 6 6 7 = C 0 1 0 0 4 5 0T 1 6 4 yi 5 4 Zw 0 0 1 0 1
Xw
&
3 7 7 7 7 5
General Pin-Hole Camera Equation " # " # " # xi (R1 Xw + tx )=(R3 Xw + tz ) x =f + 0 yi (R2 Xw + ty )=(R3 Xw + tz ) y0 43
' &
or
Perspective Projection (Special Case)

X2 x 2 X1 x 1
image plane
(X , X , X ) 1 2 3
f lens center (x , x ) 1 2
X x
The camera coordinate system is aligned with the world coordinate system.
x1 = ; X1 and x2 = ; X2 (similar triangles) f X3 ; f f X3 ; f fX2 1 and x x1 = f fX 2 = ; X3 f ; X3
44
'
Let then
Weak-Perspective Projection
Zi
= R3
X+
Dz
, then the perspective projection is given by " # " # c x = f (R1 X + Dx)=Zci + ox (R2 X + Dy )=Zi oy
If the average distance of the object from the camera

Zi
=
"
Zi
;
T
Zave
is such that
Zave
= R3
X
"
ave
<< Zave
&
f x = Zave c
R1 X + f c Zave RT 2
# " # Dx o + x Dy oy
45
' &
T
A ne Camera
An uncalibrated weak-perspective projection 2 3 2 3 2 X1 3 x1 7 6 T11 T12 T13 T14 7 6 7 6 6 X2 7 6 7 6 7 6 7 4 x2 5 = 4 T21 T22 T23 T24 5 6 X3 7 4 5 x3 0 0 0 T34 X4 In Cartesian coordinates, where M is a 2 3 matrix with elements t = 14 34 24 34]
=T T =T
x = MX + t
Mij
Tij =T34
and
46
' &
Orthographic Projection
X X
Let the image plane be parallel to the 1 ; 2 plane of the world coordinate system. Then, in Cartesian coordinates x1 = X1 and x2 = X2 or in vector-matrix notation 2 3 " # " # X1 7 x1 1 0 0 6 6 = X2 7 4 5 x2 0 1 0 X3
X 1
x1 X 3 x2 X 2
All rays from the 3-D object (scene) to image plane are parallel to each other.
47
' &
$
)
PHOTOMETRIC IMAGE FORMATION

If a Lambertian surface with constant albedo is illuminated by a single point source, the image intensity under orthographic projection is given by
sc x1 x2 t
)=
where L = ( 1 2 3) is the unit vector in the mean illuminant direction and N is the unit surface normal of the scene at position ( 1 2 3 ( 1 2 )) given by
L L L X X X X X
N( ) L
t
N = (; ;
p p q
1) ( 2 + 2 + 1)1=2
= p q X3 x1 x2
@X3 3 in which = @X @x1 and = @x2 are the partial derivatives of depth with respect to the image coordinates 1 and 2 respectively.
x x
48
' &
s ( x , x , t) image intensity c 1 2 N (t) surface normal L illumination
Photometric model.
Note that the illuminant direction can also be expressed in terms of tilt and slant angles as
= ( 1 2 3) = (cos sin sin sin cos )

L L L X X3 X
where , the tilt angle of the illuminant, is the angle between L and the 1 ; plane, and , the slant angle, is the angle between L and the positive 3 axis. 49
'
where
Photometric E ect of 3-D Motion

ds x x t d
Assuming that the mean illuminant direction L remains constant, we can express the change in intensity due to photometric e ects of the motion as c( 1 2 ) = L N N at the point ( Approximate ddt
dt dt : X1 X2 X3 d
N
t
) as
dt
&
and
) ; N( 1 2 3 ) (; ; 1) ; (; ; 1) = ( 2 + 2 + 1)1=2 ( 2 + 2 + 1)1=2 =
N(
p0
X1 X2 X3 p
0
q0
= =
@X3 @x1 @X3 @x2

0 0 0
@X3 @x1 @ x1 @x1 q q

0
+ =; 1; 50
+ =; 1+
p p
' &
LECTURE 3 SPATIO-TEMPORAL SAMPLING
1. Spatio-Temporal Sampling 2-D Sampling Structures for Analog Video 3-D Sampling Structures for Digital Video Analog-to-Digital Conversion 2. Spectral Characterization of Sampled Video 2-D Sampling on a Rectangular Grid 2-D/3-D Sampling on a Lattice 3. Reconstruction of Continuous Video from Samples Digital-to-Analog Conversion
52
' &
Spatio-Temporal Sampling
R G source B RGB to YUV Y U V NTSC encoder composite signal NTSC decoder Y U V YUV to RGB R G B display
Consider the image plane intensity distribution ( three continuous variables. Then,
x t
sc x1 x2 t
) as a function of
{ for analog storage and transmission it is sampled in two dimensions

Sampling the composite signal vs. component signals.
(usually 2 and ) by means of the scanning process, and { for digital processing, storage and transmission in all three dimensions.
53
' &
2-D Sampling Structures

2
Analog Progressive Video x

x V= 0 x 2 t t 2 0 t
Analog 2:1 Interlaced Video x

2 2 x V= x 0 2 x 2
t /2
2 t t /2
(Each dot indicates a continuous line of video perpendicular to the plane of the page.)
54
' &
3-D Sampling Structures

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 V = x 0 1 0 x 0 0 2 0 t
Progressive Sampling
Vertically Aligned 2:1 Line-Interlaced Sampling

1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 V = 0 x1 0 0 2 x x 2 2 0 0 t/2
(Each dot indicates a pixel location, the numbers indicate the time of sampling.)
55
'
Field-Quincunx Sampling
1 2 1 2 1 1 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 2 x 0 x1 /2 1 V= 0 2 x2 x2 0 0 t /2
Line-Quincunx Sampling
1 2 1 2 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 x x /2 0 1 1 V = 0 2 x 0 2 0 0 t 0 c= x 2 t /2
&
1] E. Dubois, \The sampling and reconstruction of time-varying imagery with application in video systems," Proc. IEEE, vol. 73, no. 4, pp. 502-522, Apr. 1985.
56
' &
Analog-to-Digital Conversion
Minimum sampling frequency is 4.2 2 = 8.4 MHz (Nyquist rate) Sampling rate should be an integral multiple of the line rate, so that samples in successive lines are aligned. For sampling the composite signal, the sampling frequency must be an integral multiple of the subcarrier frequency. This simpli es decoding (composite to RGB) of the sampled signal. For sampling component signals, there should be a single rate for 525/30 and 625/50 systems i.e., the sampling rate should be an integral multiple of both 29.97 525 = 15,734 and 25 625 =15,625.
57
' &
Sampling the Composite Signal

NTSC 3 fsc 4.2 3.58/10.74 682/576 85.9 NTSC SMPTE 244M 4.2 3.58/14.32 910/768 114.5 PAL 4 fsc 5.5 4.43/17.72 1134/939 141.8
Bandwidth (MHz) Subcarrier/sampling frequency (MHz) Total/active samples/line Bitrate (Mbps)
Sampling Component Signals

Luminance Chrominance 4:2:2 Sampling frequency (MHz) Total/active samples/line Bitrate (Mbps) Sampling frequency (MHz) Total/active samples/line Bitrate (Mbps) 525/59.94 SMPTE 125M 13.5 858/720 108 6.75 429/355 54 625/50 13.5 864/720 108 6.75 432/358 54
58
' &
Chrominance Formats for Digital Video

Y Y Y
U U U
V V V
4:4:4
4:2:2
4:2:0
59
' &
2-D Sampling on a Rectangular Grid

With rectangular sampling, we sample at the locations
x1 x2
= =
n1 n2
1 2
x y
where 1 and respectively.
are the sampling distances in the and directions,
The sampled signal can be expressed as

s n1 n2
)= (
sc n1
n2
60
' &

sc x1 x2
c 1995-97 Prof. A. M. Tekalp )
2-D Fourier Transform of Continuous Signals

(
Sc F1 F2
Sc F1 F
s c x1 x
1Z1 ( 1 2 ) exp f; 2 ( 1 1 + 2 2 )g 2) = ;1 ;1 Z1Z1 ( ) exp f 2 ( 1 1 + 2 2 )g 2) = ;1 ;1 1 2

Z
sc x x j F x F x Sc F F j F x F x
dx1 dx2
dF1 dF2
2-D Fourier Transform of Discrete Signals

s n1 n2
S f1 f2
S f1 f2
)= )=
1 X
n
1 X
n
1=
;1
Z
2=
;1
(
s n1 n2
) exp f; 2 (
j j f1 n1
f1 n1
f2 n2
)g
s n1 n2
; ;
1 2
1 2
1 2 1 2
S f 1 f2
) exp f 2 ( 61
f2 n2
)g
d f1 d f2
'
( (
(
Spectrum of the Sampled Signal

1Z1 ( 2) = ;1 ;1
Z
Sc F1 F2
Evaluate the inverse Fourier transform expression at the sampling locations:

s n1 n
) exp f 2 (
j
F1 n1
F2 n2
)g
dF1 dF2
De ne
f1
s n1 n
1Z1 ( 2) = ;1 ;1
Z
Sc
F1
and
f2
F2
2
f2
, ) exp f 2 (
j f1 f2 f1 n1
f1
f2 n2
)g 1
1
d f1 d f2
Next, break the integration over the ( over a square denoted by ( 1 2)

s n1 n2
) plane into a sum of integrals each

2
)=
XXZ Z
k
SQ k
1
SQ k
&
where
SQ k1 k2
) is de ned as 1+ 1+ ;2 1 1 2
k < f
1 k2 )
Sc
f1
f2
) exp f 2 (
j
f1 n1
f2 n2
)g
d f1 d f2
k1
and 62
;1 2+
k2 < f2
1+ 2
k2
'
A change of variables
f1
0=
(
1 2
f1
shifts all the squares

s n1 n2
SQ k1 k
0 = 2; 2 and 2 1 1 1 1 2 ) down to (; 2 2 ] (; 2 2 ],
k1 f f k
) =
But, exp f; 2 ( 1 1 + 2 2 )g = 1 for frequencies ( 1 ; 1 2 ; 2 ) map onto (

j k n f k n f k k s n1 n2
exp f 2 (
j
1 ;2
f 1 ;
2
1 2
1 XX (
2
Sc
f1
;
j
k1
f2
f1 n1
f2 n2
)g exp f; 2 (
; 2 )g 2
k
k1 n1
k2 n2
)g
d f1 d f2
k1 k2 n1 n2 f1 f
)=
; ;
1 2
1 2
1 2
integers. Thus, the 2 ). Compare the last expression with

f1 n1
1 2
S f 1 f2
) exp f 2 (
j
f2 n2
)g
d f1 d f2
&
to conclude that
S f1 f2
)=
1
1 2
XX
Sc
k
for ; 1 2
f1
;
1 2
k1
f2
k2
< f1 f2
63
' &
F 2 S (F ,F ) c 1 2 F 1
B (a) x
F 2 2
S (F ,F ) p 1 2
1/ x x 2 x x 1 1
2 1/ x F 1
(b)
(c)
Sampling on a 2-D rectangular grid.
64
' &
2-D Periodic Sampling with Arbitrary Geometry

v v
T
An arbitrary periodic sampling geometry can be de ned by the vectors v1 = ( 11 21) and v2 = ( 12 22) , such that
v v
T
x1 x2
= =
v11 n1 v21 n
+ 1+
v12 n2 v22 n2
v 2 v 1
Arbitrary periodic sampling geometry.
65
'
where and
In vector-matrix form,
x = Vn x=(
x1 x2
) , n=(
T
n1 n2
is the sampling matrix. Thus, the sampled signal can be expressed as

s
V = v1jv2]
(n) = (Vn)
sc
&
^ = EV, where E is 1) The sampling matrix V for a given grid is not unique. V an integer matrix with detE = 1 is also a sampling matrix for that grid. 2) The quantity jdetVj is unique and denotes the reciprocal of the sampling density. 66
' &
2-D Fourier Transform Relations in Vector Form

Sc
sc
where F = (
F1 F2
) .
T
1 (x) exp ; 2 F x x (F) = ;1 Z1 (x) = (F) exp 2 F x F ;1

Z
sc j
T
Sc
(f ) =
1 X
n=;1
Z
s
(n) exp
;2
j j
T
f n
T
(n) =
where f = ( 1 2 ) . The integrations and summations in these relations are double integrations and summations.
f f
T
1 2 1 2
S
(f ) exp 2 f n
67
' &
s
Spectrum of the Sampled Signal

1 (n) = (Vn) = (F) exp 2 F Vn F ;1
Z
sc Sc j
T
Similar to the case of rectangular sampling, express

s
where f = j Vj F using the Jacobian. Expressing the integration over the f plane as a sum of integrations over the 1 1 ], we have 1 1] (; 2 squares (; 2 2 2
d det d
Making the change of variables f = V F, Z1 ;1 1 (n) = ( V f ) exp 2 f n f j Vj

T
;1
det
Sc
(n ) =
; k
1 2
where exp
;2
j
1 2
1
T
det
Vj
Sc
(V
;1
(f ; k)) exp 2 f n exp

j
T
;2
j
kn f
T
k n = 1 for k an integer valued vector.

68
'

Z
1 2 1 2
S
Comparing this expression with

s
(n) =
we conclude that
S
(f ) exp 2 f n f
j
T
(f ) = j V j k
det
X
Sc
(V
;1
(f ; k))
or equivalently
Sp
(F) = j Vj k where the periodicity matrix U satis es

det
T
X
Sc
(F ; Uk)
U V=
&
and I is the identity matrix. The periodicity matrix can be expressed as U = u1ju2], where u1 and u2 are the periodicity vectors. Note that the above formulation is also valid for rectangular sampling with the matrices V and U diagonal. 69
' &
F 2 S (F ,F ) c 1 2 F 1
B (a)
F 2 2
v 2 v1 x 1
u2 u 1 F 1
(b)
(c)
Sampling on an arbitrary 2-D periodic grid.
70
'
then
Sampling on 3-D Lattices
Let v1 v2 v3 be linearly independent vectors in the 3-D Euclidean space R3 . A lattice in R3 is the set of all linear combinations of v1 v2 v3 with integer coe cients = f 1 v1 + 2v2 + v3
n n k
n1
n2
2 Zg
In vector-matrix notation, let V be the sampling matrix
V = v1jv2jv3]
]
T
= fV
s k sc
n1 n2 k
j
T
n1 n2 k
) 2 Z3 g )2
Z
&
A spatio-temporal signal (x ) sampled on a lattice can be expressed as

sc t
Observe that ( ) = jdet(V)j denotes the reciprocal of the sampling density, and V is not unique.
d
(n ) = (V
n1 n2 k
] )
n1 n2 k
71
' &
Reciprocal lattice
T
x Given a lattice , the set of all vectors r such that r 4 5 is an integer

for all (x ) 2 is called the reciprocal lattice of . A basis for is the set of vectors u1 u2 u3 determined by
t t
uv =
T i j
ij
i j
=1 2 3
or equivalently
U V=I
T
where I is an 3x3 identity matrix.
72
' &
Unit Cell (Voronoi cell) The set of points that are closer to the origin than to any other sample point.
x2
73
'
s
Fourier Transform on a Lattice

k sc n1 n2 k
Let (n ) = (V
S
] )
T
n1 n 2 k
(f ) =
n (n ) exp :; 2 f 4 (n )2Z3
X
k
8 <
)2
, then
k
39 = 5
f 2 R3
and
s
(n ) =
k
T
;1 2
1 2
n (f ) exp : 2 f 4
j
T
8 <
39 = 5 df
(n ) 2
k
&
where f = V F is the normalized frequency. The Fourier transform of a signal sampled on a lattice is periodic with the replications centered at the sites of the reciprocal lattice . Note that 1 1 1 1 1] (; 1 ] 2 P , where P f 2 (; 2 2 2 2 ] (; 2 2 ] implies that F = 1 2 denotes the unit cell of the reciprocal lattice .
F F Ft
T
74
' &
Spectrum of Signals Sampled on a Lattice

sc L
Suppose that (x) 2

Sc
(R )
M
(F) =
R3
Z
sc
x (x ) exp :; 2 F 4
t j
T
8 <
39 = 5 dx dt
F 2 R3
(x ) 2 R3
t
with the inverse transform

sc
(x ) =
t
R3
Sc
(F) exp : 2 F
j
8 <
2
T
x
t
39 = 5 dF
The Fourier transform of the sampled signal is equal to an in nite sum of copies of the analog spectrum shifted according to the reciprocal lattice X 1 (F) = ( ) (F + Uk) k2Z3 where U V=I
Sp d Sc
T
75
' &
pro
$
3
Example: Progressive and the 2:1 line interlaced sampling lattices.

t /2 t
(a)
(b)
The periodicity matrices indicating the locations of the replications

2
=V
pro
;1 T
6 =6 4 0
0
1
0 7 0 7 5 and
1
t
int
=V
int
;1T
6 =6 4 0
0 7 0 7 5
2
t
76
' &
Sublattices Let and ; be lattices. is a sublattice of ; if every point in is also a point of ;. Then, ( ) is an integer multiple of (;). The quotient ( ) (;) is called the index of in ;, and is denoted by ( : ;). If is a sublattice of ;, then ; is a sublattice of .
d d d =d
Cosets of a lattice The set
c + = fc + 4 x
t
3 2 5 4
x
t
3 5
and c 2 ;g
is called a coset of in ;. Thus, a coset is a shifted version of the lattice .
77
' &
Other Sampling Structures
The most general form of the sampling structure that we will study is the union of certain cosets of a sublattice in a lattice ; = where c1 Note that
:::
P P
becomes a lattice if we take = ; and = 1.

P
c is a set of vectors in ; such that c ; c

i
=1
(c + )
i j
62
for = 6 .
i j
v 2 2 c v1 x 1 x 1
78
' &
Spectrum of Signals Sampled on a Structure

Sp
The function
X 1 (k) (F + Uk) (F) = ( ) k

d g Sc g
(k) =
P X
is constant over cosets of ; in , and may be zero for some of these cosets, so the corresponding shifted versions of the analog spectrum are not present.
F 2
=1
exp 2 k U c
j
T T
F 1
Reciprocal lattice
79
'
Reconstruction from Samples on a Rectangular Grid

Sr F1 F2 S F F F < F <
Band-limited reconstruction of the analog video requires ideal low pass ltering ( forj 1 j 2 1 1 andj 2 j 2 1 2 1 2 ( 1 1 2 2) ( )=
0 otherwise.
F 2
1/2 x2 F 1 1/2 x1
&
(
Taking the inverse Fourier transform, we have

sr x1 x2
Reconstruction lter. (
F1
)=
1 2 1 ;1 2 1
1 2 2 ;1 2 2
2S
F2
) exp f 2 (
j
F1 x1
F2 x2
)g
dF1 dF2
80
'
( (

Z
1 2 1 ;1 2 1

S F1
Substituting the de nition of (

sr x1 x2
) =
F2
)
)
j F1 x1
$
)g
exp f; 2 (
j
1 2 2 ;1 2 2
F1
XX
n
s n1 n2
1 n1
F2
2 n2
)gg exp f 2 (
F2 x2
)g
dF1 dF2
Rearranging the terms, we have

sr x1 x2
) =
XX
n
s n1 n2
1 2 1 ;1 2 1
exp f 2 (
j
1 2 2 ;1 2 2
exp f; 2 (
j F1 x1
F1
1 n1
F2
2 n2
F2 x2
)g
dF1 dF2
Note that the integral evaluates to
&
h x1 x2
)=
sin
; 1 1) 1 ( 1 ; 1 1)
1 (x1
x n n
sin
; 2 2) 2 ( 2 ; 2 2)
2 (x2
x n n
which is the ideal interpolation function for rectangular sampling. 81
' &
where
Reconstruction from Samples on a Lattice
Exact reconstruction of a continuous signal from its samples on a lattice is possible via ideal low-pass ltering over a unit cell P of provided that the original continuous image spectrum was con ned to this unit cell. The ideal low pass ltering can be expressed as
Sr
(F) = :
8 <
det
Vj (V F) for F 2 P
S
T
otherwise.
2 3
In the space domain, we have

sr
(x ) =
t
X
(
n )2Z3
k
x n (n ) (4 5 ; V 4 5)
k h t k
(x) = j Vj
det
Here (x) is the ideal interpolation function for the particular lattice geometry.
h
x exp : 2 F 4
j
T
8 <
39 = 5 dF
82
' &
LECTURE 4 SAMPLING STRUCTURE CONVERSION

1. Video Standards Conversion 2. Interpolation and Decimation of 1-D Signals 3. Theory of Sampling Structure Conversion
83
' &
Sampling Structure Conversion

sp(x1 , x2 , t) 3 (x1 , x2 , t) 1 Sampling Structure Conversion yp (x , x , t) 1 2 3 (x , x , t) 1 2 2
This is a spatio-temporal interpolation/decimation problem.
Applications
Frame-Rate Conversion Deinterlacing (interlaced ! progressive) Interlacing NTSC-to-PAL transcoding or vice versa Data Compression (U, V subsampling) 84
' &
Fundamentals of Decimation/Interpolation
u(n) w(n) Downsample M:1
L M.
s (n)
Upsample 1:L
Low pass filter
y (n)
Sampling rate change by a rational factor
Characterization in the Frequency-Domain Filter Design for Interpolation/Decimation

1] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, NJ, 1989.
85
' &
Interpolation
s n
Given ( ), de ne a signal ( ) that is upsampled by

u n
8 n < (L) )=: 0

s
u n
for = 0 otherwise.
n
s(n)
(a) 0 1 2 3 4 5 ... u(n) (b) 0 1 2 3 4 5 ...
Upsampling by L = 3.
86
' &
Spectrum of the Upsampled Signal

U f
( )=
1 X
n=
;1
u n e
( ) ;j 2
fn
1 X
n=
;1
s n e
( ) ;j 2
fLn
= ( )
S fL
S(f)
f (a) U(f) -1/2 0 1/2
f (b) -1/2 -1/6 0 1/6 1/2
Upsampling by L = 3.
87
' &
Ideal Interpolation Filter

H(f) U(f)
Ideal interpolation lter is an ideal lowpass lter.
...
... f
(a) Y(f) ...
-1
-1/2
0 1/2L 1/2
... f
(b)
-1
Interpolation by L = 3.
The impulse response of the ideal interpolation lter is a sinc function. Because of its zero-crossings it will not alter the existing signal samples, while assigning values for the zero samples in the upsampled signal. 88
' &
Practical Interpolation Filters

n h(n) 1 n 0 1 2 h(n-k) u(k) k
Zero-order hold (sample repeat)
The impulse response for L = 3.
Linear interpolation
h(n)
1 2/3 1/3 2/3
h(n-k) n u(k)
1/3
k n
89
' &
Cubic Spline Interpolation

- Approximate the impulse response of the ideal lowpass lter (sinc function) by three cubic polynomials. - The frequency response is better than that of the truncated sinc function.
h(n) n
0
h(n-k) u(k) k n
90
'
Then,
Decimation
s n w n
Given ( ), de ne an intermediate signal ( ) ( )= ( )

s n y n
1 X
w n
k=
;1
kM
( )= (
w Mn
)
... (a) n
s(n)
0 1 2 3 4 5 6 ... w(n) ...
(b) n
&
y(n)
0 1 2 3 4 5 6 ... ... 0 1 2 3 4 5 6 ...
(c) n
Decimation by M = 2.
91
' &
Spectrum of the Decimated Signal

W (f ) Y (f ) =
S(f) ... (a) -1 W(f) ... (b) -1 Y(f) ... (c) -1 ... f -1/2 0 1/2 1 ... f -1/2 0 1/2 1 ... f
1 X
n=;1
M ;1 k=0
S (f
k ;M )
w (M n)e;j 2 fn
f ) = W(M
Decimation by M = 2.
-1/2
1/2
92
' &
Decimation Filters
Decimation filter S(f) ... ... f -1 W(f) ... ... f -1 Y(f) ... ... f -1 -1/2 0 1/2 1 -1/2 0 1/2 1 -1/2 0 1/2 1
To avoid aliasing, lowpass lter the signal before decimation.
Antialias ltering for M = 2.
Box lters are generally used instead of ideal lowpass lters for simplicity. 93
' &
Rate Change by a Rational Factor

s (n) Upsample 1:L u(n) Low pass filter w(n) Downsample M:1 y (n)
Rate change by a factor of L=M .
A single lowpass lter with cuto frequency 1 1 c = minf 2M 2L g is su cient. When , the requirement to preserve the values of the existing samples must be incorporated into the lter design.
f L > M
94
' &
Practical Method
o x o x o 3:4 conversion
525
625
525:625 conversion
95
' &
Theory of Sampling Structure Conversion

We extend the notions of decimation and interpolation to conversion from one sampling structure (lattice) to another. Sums of lattices
1+ 2=
fx + y j x 2 fx j x 2
1 and y
2 2g
Intersection of lattices
The intersection 1 2 is the largest lattice which is a sublattice of both 1 and 2 , while the sum 1 + 2 is the smallest lattice which contains both 1 and 2 as sublattices. 96
2=
1 and x
2 2g
'
and

u p (x , x , t) 1 2 3 3 (x , x , t) + 1 2 1 2 sp (x , x , t) 1 2 3 (x , x , t) 1 2 1

w p (x , x , t) 1 2 3 3 (x , x , t) + 1 2 1 2 y (x , x , t) p 1 2 3 (x , x , t) 1 2 2
Upconvert U
Low pass filter
Downconvert D
Decomposition of the system for sampling structure conversion.
De ne
8 < p(x p (x ) = U p(x ) = :0

s t s t y
) (x ) 2 1 (x ) 62 1 x ) 2 1 + 2
t t t
&
p(
x ) = D p (x ) = p (x ) (x ) 2
t w t w t t
if the T input is shifted by q, the output should also be shifted by q. We need q 2 1 2. Thus, we assume that T ;1 V2 is a matrix of integers. 1 2 is a lattice, i.e., V1
Condition for the shift invariance of the lter:
97
'
w u
The ltering operation can be expressed as

p(
The Filter
x )=
t s
X
(q )2 1 +
2
p(
q )
2 3 2 3 x q (4 5 ; 4 5)
t
(x ) 2 1 + 2
t
but p(x) = p(x) for x 2 1 and zero otherwise,

w
p(
x )=
t
(q )2
p(
q )
2 3 2 3 x q (4 5 ; 4 5)
t
(x ) 2 1 + 2
t
After the downsampling,

y
p(
x )=
t
X
(q )2
1
p(
q )
2 3 2 3 x q (4 5 ; 4 5)
t
(x ) 2 2
t
&
One period of the lter frequency response is given by the unit cell of ( 1 + 2 ) . In order to avoid aliasing, the passband of the lowpass lter is restricted to the smaller of the Voronoi cells of 1 and 2 . 98
' &

1
Example: Conversion from

x 2 1 x 2 2 2 x 1 x 2
to
2 x 1 x x 1 2 0 2 x 2
2 x 1 1 2 x x 0 2 4 x 2
V= x 2
V= x 2
2 2 + 1 2 x
2 1
2 2 x
2 x 1 x 0 1 0 x
2 x 1 2 x 0
x 0 4 x 2
V=
V= 2
The lattices 1 , 2 , 1 + 2 and 2. d( 1 ) = 2X1 X2 , and d( 2 ) = 4X1 X2 T 2) = 2 Q = ( 1 + 2 : 1) = ( 2 : 1
T 1
99
' &
* 1
F 2 1/ x
* 2
F 2
1/ x
F 1
F 1
U=
1/ x1 - 1/2 x1 0 1/2 x 2
The spectrum of s(x) with periodicity
, and the frequency response of the lter.
One period of the lter frequency response is given by the unit cell of ( 1 + 2 ) . In order to avoid aliasing, the passband of the lowpass lter is restricted to the Voronoi cell of 2 . 100
' &
Example: Deinterlacing
t t
(a)
(b)
The sampling matrices for the input and output grids are
2 x1 6 Vin = 6 40
0
det
Note that, j Vin j = 2j Vout j.

det
0 2 0
0
x2 x2 t
3 7 7 5
and
2 x1 6 Vout = 6 40
0
0 0
x2
0 0
t
3 7 7 5
101
' &
Comments on Direct Methods

In direct methods for sampling structure down-conversion, there is a tradeo between allowed aliasing errors and loss of resolution (blurring) due to lowpass ltering prior to down-conversion. When lowpass (antialias) ltering has been used prior to down-conversion, the resolution cannot be recovered by interpolation. Motion-compensated interpolation schemes make it possible to recover higher resolution frames in the process of up-conversion if no antialias ltering has been applied prior to down-conversion.
102
' &
LECTURE 5 OPTICAL FLOW METHODS
1. Projected Motion vs. Optical Flow 2. Occlusion and Aperture Problems 3. Optical Flow Equation 4. Two-D Motion Field Models, Nonparametric vs. Parametric 5. Lucas-Kanade Method 6. Smoothness Constraint, Horn-Schunck Method 7. Adaptive Methods
103
' &
Motion Estimation Problems with Applications
2-D Motion Estimation Correspondence estimation Optical ow estimation - Motion compensated image ltering. - Motion compensated image compression. 3-D Motion and Structure Estimation Based on point correspondences Optical ow-based or direct methods From stereo video - Virtual Reality, Synthetic-Natural Hybrid Imaging - Passive Navigation: A camera moves with respect to a xed environment. Determine the 3-D structure of the environment and the motion parameters of the camera. 104
' &
Two-D Motion
X 2 Center of projection O p P x 2 x X 3 1 Image plane X 1
There is 3-D motion between the objects in the scene and the camera.
P t p t p t O Center of projection Image plane P t
The \2-D motion" is also referred to as \projected motion." 105
' &
2-D Displacement and Velocity Fields

x x
The 2-D displacement eld is a vector eld consisting of the 1 and 2 components of the frame-to-frame \projected" displacement vectors at each pixel.
time t +l t time t time t - l t d d d 1 1 d , x ) P = (x 1 2 2
2 , x ) P = (x 1 2
) P = (x 1,x 2
T d= [ d d ] 1 2
The 2-D velocity eld is a vector eld consisting of the of the instantaneous velocity vectors at each pixel. 106
x1
and
x2
components
' &
Optical Flow and Correspondence Fields

The observable variations of the 2-D image brightness pattern (the apparent 2-D velocity eld) is called the optical ow. The set of vectors indicating the apparent displacement of pixels from frame to frame is called the correspondence eld. The optical ow/correspondence eld is, in general, di erent from the projected 2-D motion eld due to: - lack of su cient spatial image gradient, - changes in external illumination, - changes in shading (due to rotation), etc.
107
' &
Optical Flow vs. 2-D Velocity Field
There must be su cient gray level variations within the moving objects.
rad/s
Changes in the illumination impairs the estimation of the projected motion.

Frame k k+1
108
' &
Determination of the apparent velocity v( 1 2 ) of pixels from a pair of time-sequential 2-D images. The ow vectors may vary by the coordinates (space-varying ow) due to 3-D rotation, zoom, etc.
x x t
Optical Flow Estimation
Finding the apparent displacement vectors d( 1 2 ) between a pair of frames and = + . Dense or feature correspondence estimation. (May also appear in the context of stereo disparity estimation.)
x x t ` t t t
0
Correspondence Problem
t
Given two frames that are globally shifted with respect to each other, estimate the shift. There is one displacement vector for a pair of frames.
Image Registration (Special case)
109
' &
2-D Motion/Optical Flow Estimation is Ill-Posed

Estimation of the optical ow (or the 2-D motion eld) given two frames, without additional assumptions, is \ill-posed." 1. Existence of a solution: No correspondence can be found at occlusion points (covered/uncovered background problem). 2. Uniqueness of the solution: If the 1 and 2 coordinates of the displacement (or velocity) at each pixel is treated as independent variables, then the number of unknowns is twice the number of observations - the elements of the frame di erence.
x x
Theoretically, we can determine only motion that is orthogonal to the spatial image gradient, called the normal ow, at any pixel (the aperture problem).
110
' &
The Occlusion Problem
Occlusion refers to covering/uncovering of a surface due to motion of an object. e.g. 1, when an object translates,
Frame k k+1
Background to be covered
(no region in the next frame matches this region)
Uncovered background
(no motion vector points into this region)
e.g. 2, when an object rotates about an axis parallel to the imaging plane. 111
' &
The Aperture Problem

Aperture 2
Aperture 1 Normal flow
Basic Idea: We can only observe and determine displacement that is orthogonal to the edges (in the direction of the intensity gradient).
112
' &
Optical Flow Equation (OFE)

s x1 x2 t
If the intensity c(
) remains constant along a motion trajectory, we have

dsc x1 x2 t dt
) =0
where 1 and 2 varies by according to the motion trajectory. Using the chain rule of di erentiation
x x t @ sc @ x1
(x ) (x) + 1
t v
@sc
@x2
(x ) (x) + 2
t v
@sc
@t
(x ) = 0
t
This is known as the optical ow equation or the optical ow constraint. It can alternatively be expressed as
h r c(x
s s t : @ sc
) v(x) i +
t @ x2
@ sc
@t
(x ) = 0
t
where r c (x ) =
(x )
t
@sc
@ x1
(x ) ]T and h 113
i denotes vector inner product.
' &
Normal Flow
Is the OFE su cient to uniquely specify the motion eld ? The OFE yields one scalar equation in two unknowns at each pixel.
v 2 Loci of v satisfying the optical flow equation sc (x1 ,x2 ,t)
The OFE determines, at each pixel, the component of the ow vector that is in c (x t) the direction of the spatial image intensity gradient, s sc (x t) ,
r jjr jj
jj because the component that is orthogonal to the spatial image gradient disappears under the dot product.
v t)
? (x
; jjr
@sc (x t) @t sc (x t)
114
' &
Motion Models
Because of the ill-posed nature of the problem, motion estimation algorithms use additional assumptions (models) about the structure of the 2-D motion eld. Non-parametric models: Some sort of smoothness or uniformity constraint on the 2-D motion eld. Quasi-parametric models: In 3-D rigid motion six egomotion parameters constrain the local ow vector to lie along a speci c line, while the local depth value is required to determine its exact value. Parametric models: 3-D rigid motion of the image of a planar surface under orthographic projection can be described by a 6-parameter a ne model, while under perspective projection it can be described by an 8-parameter nonlinear model. There exist more complicated models for quadratic surfaces. 115
' &
Nonparametric 2-D Motion Estimation Methods
Methods Based on the OFE: Constant intensity along the motion trajectory yields an equation in terms of spatio-temporal intensity gradients. Used in conjunction with appropriate spatio-temporal smoothness constraints. Phase-Correlation Method: The linear term of the Fourier phase di erence between the consecutive frames determines the motion estimates. Block Matching Method: Matching xed size blocks between two frames based on a distance criterion. Extension to feature matching (e.g., edges, corners). Pel-Recursive Methods: Gradient-based minimization of the displaced frame di erence. Implicit use of smoothness constraint. Extension to Wiener-type motion estimation. Bayesian Methods: Probabilistic smoothness constraint in the form of Gibbs random elds.
116
' &
Methods using the OFE

COLOR IMAGES OFE can be imposed at each color band separately. Thus, the displacement vector is e ectively constrained in three di erent directions, since the direction of the spatial gradient vector at each band is di erent in general. MONOCHROMATIC IMAGES The solution space for the displacement vector can be reduced by using an appropriate smoothness constraint which requires the displacement vector to vary slowly over a neighborhood.
117
' &
Second-Order Di erential Methods

In search of another constraint to determine both components of the ow vector at each pixel, some proposed the conservation of the spatial image gradient, r c(x ), stated by
s t d
r c(x
s dt
) =0
An estimate of the ow eld is, then, given by 2 3 2 @ 2 s (x t) @ 2 s (x t) 3 1 2 c c ^ ( x ) 2 1 @x @x 2 x1 5 4 4 5 = 4 @ 2 s (1 2 @ sc (x t) c x t) ^2 (x ) @x1 x2 @x2 2

;
v v
t t
; ;
@ 2 sc (x t) @t@x1 @ 2 sc (x t) @t@x2
3 5
118
' &
v v
Lucas-Kanade Method
v(x t) = v(t) =
v1 t
$
3 7 7 5
The Block Motion Model:
( ) 2( )]T
v t
for x 2 B
De ne the error in the OFE over the block of pixels B as X c (x ) c (x ) c (x ) = 1( ) + 2( ) +

E @s t
x2B
@x1
@s
@x2
@s
@t
Minimization of with respect to 1 ( ) and 2 ( ) yields 2 3 2 X @sc (x t) @sc (x t) X @sc (x t) @sc (x t) 3 12; X @sc (x t) @sc (x t) @x1 @x1 @x1 @x2 @x1 @t 6 7 6 ^ ( ) 1 x x 6 4 5=6 X @sc (x t) @sc (x t) x X @sc (x t) @sc (x t) 7 X @sc (x t) @sc (x t) 4 5 4 ^2( ) ; @x2 @t @x1 @x2 @x2 @x2
E v t v t
;
t t
2B
2B
2B
x2B
x2B
x2B
119
' &
Horn-Schunck Method
Minimize a weighted sum of the error in the OFE and a measure of departure from smoothness in the motionZ eld 2 2 2 (v)) x min = ( of (v) + s v(x)
E E c E d
to estimate the velocity vector at each pixel, where denotes the image support, and Eof (v(x)) = h r (x ) v(x) i + (x ) and Es2(v(x)) = jjr 1(x)jj2 + jjr 2(x)jj2
g t @g t @t v v
= (
c
@v1 @ x1
)2 + (
@ v1 @ x2
)2 + (
c
@v2 @x1
)2 + (
@v2 @x2
)2
The parameter 2 (chosen heuristically) is a weight that controls the strength of the smoothness constraint. Larger values of 2 increase the strength of the constraint, whereas smaller values relax the constraint. 120
' &
The minimization of the functional , using the calculus of variations, and approximation of the Laplacian of the velocity components by linear highpass lters yields the following iterations:
E
(n+1) v1 (n+1) v2
(x ) =
t
(n) v1 (n) v2
(x ) ;
t
@sc @x1 @sc @x2
@sc v (n) @x1 1 @sc @x1 v
(x ) =
t
(x ) ;
t t
@sc (n) (x ) + @sc (x ) + @x @t 2 2 @s @s 2 + ( c )2 + ( c )2 @x1 @x2 (n ) @sc (n) (x ) + @sc ( x ) + 1 @x2 2 @t 2 + ( @sc )2 + ( @sc )2 @x1 @x2
t v t t v t t
where all partials are evaluated at the point (x ). The initial estimates of the (0) (0) velocities 1 (x ) and 2 (x ) can be obtained by the block matching technique. In the digital implementation of the algorithm, the derivatives are numerically estimated.
v t v
121
'
@sc @x1 @sc @x2 @sc @t
Finite Di erences Method
Forward di erence Backward di erence Average di erence Local average of the average di erences Horn and Schunck proposed averaging four nite di erences
= 1 4 f sc (x1 + 1 x2 t) ; sc (x1 x2 t) + sc (x1 + 1 x2 + 1 t) ; sc (x1 x2 + 1 t) + sc (x1 + 1 x2 t + 1) ; sc (x1 x2 t + 1) + sc (x1 + 1 x2 + 1 t + 1) ; sc (x1 x2 + 1 t + 1) g = 1 4 f sc (x1 x2 + 1 t) ; sc (x1 x2 t) + sc (x1 + 1 x2 + 1 t) ; sc (x1 + 1 x2 t) + sc (x1 x2 + 1 t + 1) ; sc (x1 x2 t + 1) + sc (x1 + 1 x2 + 1 t + 1) ; sc (x1 + 1 x2 t + 1) g = 1 4 f sc (x1 x2 t + 1) ; sc (x1 x2 t) + sc (x1 + 1 x2 t + 1) ; sc (x1 + 1 x2 t) + sc (x1 x2 + 1 t + 1) ; sc (x1 x2 + 1 t) + sc (x1 + 1 x2 + 1 t + 1) ; sc (x1 + 1 x2 + 1 t) g
&
122
'
N
Local Polynomial Fitting Method

s x x2 t x x
Approximate c( 1 polynomials in 1 ,
) locally by a linear combination of some low order 2 and , i.e.,

t sc x1 x2 t
^(
)=
N ;1 X i=0
ai
x1 x2 t
where is the number of the basis polynomials, i are the coe cients of the linear superposition, and i ( 1 2 ) are the basis polynomials. Set = 9, with the following basis functions,
N a x x t
x1 x 2 t
)=1
x1 x2 t x1 x2 x1 x2 x1 t x2 t
&
Then,
sc x1 x2 t
^(
) =
a0
a x
1+ 1 1+ 2 2+ 3 + 4 2 1+ 2 5 2+ 6 1 2+ 7 1 + 8 2
a x a x a t a x a x x a x t a x t:
123
'
e

a i :::
The coe cients i , = 0 8, are estimated by using the least squares method which minimizes the error function N 1 X X X X 2 2 = ( c( 1 2 ) ; i i ( 1 2 )) jx1 =n1 x y =n2 x t=n3 t
;
n1 n2 n3
i=0
with respect to these coe cients. The summation is over a local neighborhood of the pixel. A typical case involves 50 pixels, 5x5 spatial windows in two consecutive frames. Once the coe cients i are estimated, image gradients can be found by simple di erentiation, c( 1 2 ) = +2 + + j =
a @s x x t @ sc x1 x2 t
( (
@x1
a1
a4 x1
a6 x2
a7 t
x1 =x2 =t=0
1 2
a1
&
@x2
@ sc x1 x2 t @t
) = ) =
a2
+2 +
a5 x2
a6 x1
+
1
a8 t
jx =x =t=0 =
a3
a2
a3
a7 x1
a8 x2
jx =x =t=0 =
2
Estimating the coe cients of the rst three basis polynomials is su cient to estimate the gradients. 124
' &
Adaptive Methods
Horn-Schunck algorithm imposes the optical ow and smoothness constraints globally on the entire image (or over the motion estimation window).
Frame k k+1
Background to be covered
(no region in the next frame matches this region)
Uncovered background
(no motion vector points into this region)
Smoothness constraint does hold in the direction perpendicular to an occlusion boundary. Several researchers proposed to impose the smoothness constraint along the boundaries but not perpendicular to the occlusion boundaries. These methods require the detection of moving object (occlusion) boundaries. 125
' &
LECTURE 6 BLOCK-BASED METHODS

1. Phase-Correlation Method 2. Block-Matching Algorithms Full-Search Three-Step Algorithm Cross-Search Algorithm 3. Hierarchical Motion Estimation 4. Motion Estimation with Spatial Transformations Generalized Block-Matching Extension of Lucas-Kanade Method
126
' &
Block Translation Model

Assume frame k + 1 is a globally (at least on a block-by-block basis) shifted version of frame k
s(n1 n2 k + 1) = s(n1 + d1 n2 + d2 k)
1) To overcome the aperture problem, there must be su cient gray level variation within the block. 2) This model is used in many practical applications including World standards for video compression such as H.261 and MPEG Motion-compensated ltering in standards conversion, etc...
127
' &
and
Phase Correlation Method

k k
The correlation between the frames k and k + 1 is given by c +1 (n1 n2) = s(n1 n2 k + 1) s(;n1 ;n2 k) Taking the Fourier transform of both sides C +1 (f1 f2) = S +1 (f1 f2 )S (f1 f2)
k k k k
Normalizing C
(f1 f2 ) by its magnitude ~ +1 (f1 f2) = S +1 (f1 f2 )S (f1 f2 ) C jS +1(f1 f2)S (f1 f2)j Given the motion model S +1 (f1 f2) = S (f1 f2 )e; 2 ( 1 1 + 2 2)
k k
+1
k k
k i
f d
f d
~ C
k k
+1
(f1 f2 ) = e; 2
j
(f1 d1 +f2 d2 )
c ~
k k
+1
(n1 n2) = (n1 ; d1 n2 ; d2 ) 128
'
i
Implementation Issues
Range of Displacement Estimates/Block Size: Since the DFT is periodic by the block size (N1 N2 ),
&
The range of estimates is ;N =2 + 1 N =2] for N even. For example, to estimate displacements within a range -31,32], the block size should be at least 64 64. Boundary E ects: To obtain a perfect impulse with the DFT, the shift must be cyclic. Since things disapperaing at one end generally do not reappear at the other end, the impulses degenerate into peaks.
i i i
8 < ^= d d : d ;N
i i
if jd j N =2 N even or jd j (N otherwise:
i i i i
; 1)=2
N odd
i
129
' &
Comments on Phase Correlation

Multiple Moving Objects: Experiments indicate that multiple peaks are observed in such a case. An additional search is required to nd which peak belongs to which part of the image. Frame-to-Frame Intensity Changes: Shifts in the mean value or multiplication by a constant do not a ect the Fourier phase. The method is insensitive to such changes.
Extension to include rotation is possible (although costly).
130
' &
Block Matching Method
The displacement at the center of an N1 N2 block in frame k is determined by searching for the location of the best matching block of the same size in the frame k + 1. The search is limited to within a search window.
k+1 Frame k
Search window Block
Block matching algorithms di er in - Matching criteria (maximum cross-correlation, minimum error) - Search strategy - Determination of block size (hierarchical, adaptive) 131
' &
Matching Criteria
1
Minimum Mean Square Error (MSE)

MSE (d1 d2 ) = N N
1
X
(n1 n2 )2B
s(n1 + d1 n2 + d2 k + 1) ; s(n1 n2 k) ]2
where B denotes an N1 N2 block.
Minimum Mean Absolute Di erence (MAD)

MAD(d1 d2 ) = N N
1
X
(n1 n2 )2B
j s(n1 + d1 n2 + d2 k + 1) ; s(n1 n2 k) j
T T
^1 d ^2] = (d1 d2 ) which minimizes the MSE or . The displacement estimate is d MAD criterion. 132
' &
Search Procedures
Usually the search area is limited to
where M1 and M2 are predetermined integers.
;M1
d1 M1 and
;M2
d2 M2
Full Search: calls for the evaluation of the matching criterion at 2M1 + 1 2M2 + 1 distinct points for each block. Three-Step Search Cross-Search
133
' &
Three-Step (Logarithmic) Search

2 1 1 2 2 1 0 2 1 2 2 3 3 3 2 3 2 3 3 3 3 1
Illustration for M1 = M2 = 7. The number of steps depends on the maximum displacement vector allowed and the accuracy of estimation e.g., a range of 32 pixels with 0.5 pixel accuracy would require 6-steps ( 16, 8, 4, 2, 1, 0.5 pixels). 134
' &
Cross-Search
1 1 0 1
2 1 2
3 2 3 5 3 5 4 5 5 4
The distance between the search points is reduced if the best match is at the center of the cross or at the boundary of the search window.
135
' &
Comments on Block Matching
Minimizing the MSE or MAD criteria can be viewed as imposing the optical ow constraint on the entire block. It is assumed that all pixels belonging to a block have a single translation vector, which is a special case of the local smoothness constraint (same as in Lucas-Kanade method). Block size selection: There are con icting requirements on the size of the blocks. - The block size should be su ciently large. It is possible that a match may be established between blocks containing similar gray-level patterns which are unrelated in the motion sense. - The block size should be su ciently small. If the motion vector varies within a block, block matching cannot provide accurate estimates. 136
' &
Hierarchical Image Representation

A hierarchical representation of the image sequence is formed using a simple low pass ltering operation at each level.
Level 3 Increasing resolution Level 2
Level 1
Decimation at each layer is optional.
137
' &
Hierarchical Block Matching

Perform block matching at each level starting with the lowest resolution image (highest level). Interpolate the result and pass onto the next higher resolution image as an initial estimate. The lower resolution levels serve to determine a rough estimate of the displacement using larger blocks. The higher resolution levels serve to ne-tune the displacement vector estimate. At higher resolution levels, smaller window size can be used since we start with a relatively good initial estimate.
138
' &
Hierarchical Block Matching

k+1 Frame k
Typical Set of Parameters for 5-Level Hierarchical Block Matching

PARAMETERS AT LEVEL: Filter Size Max Displacement Block Size 1 10 31 128 2 10 15 64 3 5 7 64 4 5 3 28 5 3 1 12
139
' &
Hierarchical BM - An Example
The center of the search area in the second level (denoted by \0") denotes the estimate from the rst level.
2 1 1 2 2 1 0 2 1 2 1 1 1 2 3 3 3 2 3 2 3 3 3 3 1 2 1 2 2 0 1 2 2 1 2 2 2 1 1
Level 1 (higher resolution)
M = 7 (3-steps) for level 2 and M = 3 (2-steps) for level 1.

T T T
Level 2 (lower resolution)
The estimates in the 1st and 2nd levels are 7 1] and 3 1] , respectively, resulting in an estimate of 10 2] . 140
' &
Shortcomings of Block Matching
Translational motion (2-parameter).
frame k
frame k+1
cannot handle rotation or zooming. Accuracy is essential in motion-compensated ltering. discontinuity at block boundaries. Blocking artifacts in motion-compensated compression. 141
' &
Spatial Transformations
Consider block-based image warping by A ne motion model (6-parameter). Perspective or bilinear motion model (8-parameter).
Affine
Affine
Perspective Bilinear
Bilinear
142
' &
Motion Estimation with Spatial Transformations

Generalized block matching
{ Search method (Seferidis and Ghanbari) { Algebraic method (Extension of Lucas-Kanade method)
2-D mesh modeling (motion continuity across block boundaries)
{ Hexagonal search (Nakaya et al.) { Constrained linear estimation (Altunbasak and Tekalp)
143
' &
Generalized Block Matching

Texture mapping
Frame k-1
Frame k
Search for all combinations of the coordinates of the corners to minimize the SAD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
reference frame current frame
144
' &
Algebraic Method
Extension of the Lucas-Kanade method to parametric motion models. A ne Motion Model:
v1(x1 x2) = a1x1 + a2x2 + a3 v2(x1 x2) = a4x1 + a5x2 + a6 E=
(x1 x2 ) 2 B
Substitute v1 and v2 in the sum of errors in OFE over the block B
x2B
I 1 (x1 x2 )v1(x1 x2 ) + I 2 (x1 x2)v2(x1 x2) + I (x1 x2)]2

x x t
Di erentiate E with respect to a1 : : : a6 and set the results equal to zero to obtain six linear equations in six unknowns 145
'
2P P 6 P 6 6 P 6 4P P
Extension of Lucas-Kanade Method (cont'd)
$
1
2 6 6 4
1 2 x2 Ix 1 Ix1 Ix2 x1 Ix1 Ix2 x2 Ix1 Ix2 2 Ix 1 2 x1 Ix
P P P P P P
a ^1 a ^2 a ^3 a ^4 a ^5 a ^6
3 7 7 5
x1 Ix1 Ix2 x2 1 Ix1 Ix2 x1 x2 Ix1 Ix2
1 2 x2 I 1 x1 2 x1 x2 Ix 1
2 x1 Ix
P P P P P P
1 2 x1 x2 Ix 1 2 x2 I x 2 1 x2 Ix1 Ix2 x1 x2 Ix1 Ix2 x2 2 Ix1 Ix2
2 x2 Ix
P P P P P P
2 Ix 2 2 x1 Ix
x2 Ix1 Ix2
Ix1 Ix2 x1 Ix1 Ix2
2 x2 Ix
&
P P P P P P 2 ;P P ;P 6 6 ; P 6 6 4 ;;P P
2 2
2 x1 Ix
x1 x2 Ix1 Ix2
x2 1 Ix1 Ix2
x1 Ix1 Ix2
2 2 x2 I 1 x2 2 x1 x2 Ix 2
x1 Ix1 It x2 Ix1 It x1 Ix2 It x2 Ix2 It Ix2 It
Ix1 It
3 7 7 7 7 5
P P P P P P
x2 2 Ix1 Ix2
x1 x2 Ix1 Ix2
x2 Ix1 Ix2
2 x2 Ix 2 x2 2 Ix
2 x1 x2 Ix 2
3; 7 7 7 7 5
146
' &
Two-D Mesh Modeling

Texture mapping
A ne motion with triangular patches. Hexagonal matching (search). Constrained Linear Estimation: All constraints are linear.
Frame k-1
Frame k
147
' &
Hexagonal Matching
There are six lines intersecting at each node in the case of a uniform triangular mesh. The boundaries of these six triangles de ne a hexagon. Perturb each node point to yield the smallest SAD within its hexagon. 148
' &
LECTURE 7 PEL RECURSIVE METHODS

1. Minimization by Gradient Descent 2. Netravali-Robbins Algorithm 3. Walker-Rao Algorithm 4. Wiener-based Estimation
149
'
Let
Displaced Frame Di erence (DFD)

d(x
t t
)=
:
d1
(x
:
d2
(x
)]T
t t t
denote the displacement eld at x = 1 2]T between frames and + The DFD function between these two frames is de ned as
x x d fd
^ ) = c(x + d ^ (x (x d
: s
) +
t
) ; c (x )
s t
&
where c ( ) denotes the spatio-temporal image intensity distribution. ^ take noninteger values, interpolation is required to If the components of d compute at each pixel location. ^ is equal to the true displacement vector and there is no interpolation If d errors, attains the value of zero at that pixel location.
s d fd d fd
150
' &
Relation between DFD and OFE

d fd
- Expanding the
s x
into Taylor series about (x ), for d(x) and

t x2 d t t s t d @s
small,
t
c( 1 + 1 (x)
d
+ 2 (x) + ) = c(x ) + 1 (x) c(x ) 1 c(x ) + + 2 (x) c(x ) +

@x d @s t @x2 t @s t @t h:o:t: d fd
- Neglecting
h:o:t:
, and setting
t d
^ ) = 0, we obtain (x d
t d t @s
@s
@ x1
c(x ) ^1(x) + c(x ) ^2 (x) +

@s @x2 t
c (x ) = 0
t @t
- Dividing both sides by

@s t @ x1 v
, and taking the limit

@s t @x2 v @s
!0
t :
c(x ) ^1(x) + c(x ) ^2(x) + c(x ) = 0

@t
151
'
. . . .
Comments
d v t d v t
In the case of constant velocity motion, where and 2 (x) = 2 (x) . 1 (x) = 1 (x) the optical ow equation is satis ed when the displaced frame di erence function attains the value of zero. In practice, neither the dfd nor the error in the OFE is exactly zero, because there is observation noise, scene illumination may vary by time, there are occlusion regions, and there are interpolation errors.
d fd
&
Therefore, one aims to minimize the absolute value or the square of the or the LHS of the OFE to obtain an estimate of the frame-to-frame motion eld. 152
' &
PEL-RECURSIVE ALGORITHMS
di+1 (x) = di (x) + ui (x)
Pel-recursive algorithms are of the general form where di (x) is the estimated motion vector at the pel location (x) in the th step, ui (x) is the update term in the th step, and di+1(x) is the new estimate.
i i
The update term ui (x) is estimated, at each pel x, to minimize a positive-de nite function of the with respect to d. The iterations may be executed at a single pel (pixel) position or at consecutive pel positions or a combination of both.
E d fd
The motion estimate at the previous pel is taken as the initial estimate at the next pel, hence pel-recursive. 153
' &
Minimization by Gradient Descent

A straightforward way to minimize a function is to set its derivatives to zero: where rd is the gradient operator with respect to d, the set of partial derivatives. The following equations must be solved simultaneously:
@E @d @E
rd
(x d) = 0
@d2
(x d) = 0 1 (x d) = 0
Since an analytical solution to these equations cannot be found in general, we resort to iterative methods. 154
'
The gradient vector points AWAY from the minimum. That is, in one dimension, its sign will be positive on an \uphill" slope. Thus, to get closer to the minimum, we can update our current vector as
d(k+1) (x) = d(k) (x)
; rd
(x d)jd k (x)
( )
where is some positive scalar, known as the step size.

too small E(d)
too large d
&
(k)
min
If is too small, the iteration will take too long to converge, if it is too large the algorithm will become unstable and start oscillating about the minimum. 155
'
Newton-Raphson Method
We can estimate a good value for using the well-known Newton-Raphson method for root nding d(k+1) (x) = d(k) (x) ; H;1 rd (x d)jd k (x) where H is the Hessian matrix 2 (x d) H =
E
( )
ij
@ E
@d @d
i j
&
In one dimension, we would like to nd a root of ( ). Expanding ( ) in a Taylor series about the point (k) ( (k+1) ) = ( (k) ) + ( (k+1) ; (k) ) ( (k) ) Since we want (k+1) to be a zero of , we set ( (k) ) + ( (k+1) ; (k) ) ( (k) ) = 0 Thus, (k) ) ( (k+1) (k) = ; ( (k ) )
E
0
00
E d
00
00
156
' &
Local vs. Global Minima

Gradient descent su ers from a serious problem: its solution is strongly dependent on the starting point. If start in a \valley", it will be stuck at the bottom of that valley. This may be a \local" minimum. We have no way of getting out of that local minimum to reach the \global" minimum. More sophisticated optimization methods, such as simulated annealing, are needed to be able to reach the global minimum regardless of the starting point. However, these more sophisticated optimization methods usually require a lot more processing time.
157
' &
Netravali-Robbins Algorithm
The Netravali-Robbins algorithm nds an estimate of the displacement vector at each pixel to minimize
E
(x d) =
d fd
(x d)]2
A steepest descent approach to the minimization problem yields the iteration

di+1 (x)
where r is the gradient with respect to d. Since
= di (x) ; (1 2) rd (x di )]2 = di (x) ; (x di ) rd (x di )

= d fd d fd d fd
rd
d fd
(x di) = rx c(x ; di
s
)
t
the estimate becomes
di+1 (x) = di (x)
d fd
(x di ) rx c(x ; di
s
158
' &
Walker and Rao Algorithm
Walker and Rao suggested the following step size 1 = jjr (x ; d i ; )jj2 x c
s t t
This is motivated by the update term should be large when j

d fd :
should be small when j
( )j is large and jr c( )j is small, and

s :
d fd :
( )j is small and jr c( )j is large.

s :
Ca ario and Rocca have added a bias term 2 to avoid division by zero in the areas of constant intensity = jjr (x ; di 1 ; )jj2 + 2 x c
s t t :
159
' &
$
.
Extension to Multiple Pixel Support

If we assume that the displacement remains constant over a support containing several pixels, we can minimize the over the support opposed to on a pixel-by-pixel basis
d fd E M M
as
M (dM ) =
x2M
d fd
(x dM )]2
This results in the following estimator

+1 di = di M ; (1 2) M
=
rd
X
x2M
d fd
(x di )]2
M
+1 where di denotes the new displacement estimate over the entire support M
160
' &
Wiener-based Estimation Algorithm

M
Linear minimum mean square error (LMMSE) estimation of the update term ui based on a neighborhood of a pel. (Extension of the multiple pel version of Netravali-Robbins algorithm.) Linearization of the at the pels of the support (x(1) di ) = ;rT c(x(1) ; di ; )ui + (x(1) di) (x(2) di ) = ;rT c(x(2) ; di ; )ui + (x(2) di) . . . . = . .
d fd d fd d fd s s t t t t v v d fd
(x( ) di) =
N
;rT
c(x( ) ; di
N
)ui + (x( ) di)

v N
Expressing this set of equations as z = uM + v the LMMSE estimate of the update term is given by ^ M = T R; u v 1 + R; u 1];1 T R; v 1z ^ M denotes the update term for the entire support where u 161
'
and
The solution requires the knowledge of the covariance matrices of both the update Ru and the linearization error Rv . 2 I and R = 2 I, Assuming that Ru = u v v
2 v T ^M = u + 2 ];1 T z u
2 i +1 v i T dM = dM + + 2 ];1 T z u
&
Note that the assumptions that are used to arrive at the simpli ed estimator are not in general true, e.g., the linearization error is not uncorrelated with the update term, and the updates and the linearization errors at each pixel are not uncorrelated with each other. However, experimental results indicate better performance than other pel-recursive estimators. 162
' &
Remarks on Pel-Recursive Methods

The \pel-recursive" nature of the algorithm can be considered as an implicit smoothness constraint. The e ectiveness of this constraint increases especially when a small number of iterations are performed at each pixel. The aperture problem also exists in pel-recursive algorithms. The update term is a vector along the direction of the gradient of the image intensity. Thus, no correction is performed in the direction perpendicular to the gradient vector. Pel-recursive algorithms can be applied hierarchically, using multi-resolution representation of images, for improved motion estimation.
163
' &
LECTURE 8 BAYESIAN METHODS
1. Introduction to Markov Random Fields and Gibbs Distribution 2. Optimization Methods Simulated Annealing (SA) - Metropolis algorithm and Gibbs sampler Iterated conditional modes (ICM) Mean eld Annealing (MFA) 3. MAP Motion Estimation Basic Formulation Discontinuity Models Estimation Algorithms 164
' &
MARKOV AND GIBBS RANDOM FIELDS

MRFs are extensions of 1-D causal Markov chains to 2-D. MRFs were traditionally speci ed by local conditional probabilities which limited their usage. Recently it has been shown that every MRF can be described by a Gibbs distribution - hence the Gibbs random eld (GRF). Bayesian estimation methods can be developed using GRFs as a priori signal models for complex image processing applications such as motion estimation and segmentation. Since Bayesian estimation requires global optimization of a cost function, we study a number of optimization methods including simulated annealing (SA), iterative conditional mode (ICM), and highest con dence rst (HCF).
165
' &
De nitions
Let a random eld z = fz (x) x 2 g be speci ed over a lattice , and ! 2 denote a realization of the random eld z. The random eld z(x) can be continuous or discrete-valued, that is ! (x) 2 R or ! (x) 2 ; = f0 1 : : : L ; 1g, for all x 2 , respectively. A neighborhood system on . The set Nx denotes the neighborhood of the site x, and has the properties: (i) x 62 Nx , and (ii) xj 2 Nxi $ xi 2 Nxj , where xi and xj denote arbitrary sites in the lattice. (In words, x does not belong to its own set of neighbors, and if xj is a neighbor of xi , then xi is a neighbor of xj , and vice versa.) The neighborhood system over is then de ned as N = fNx x 2 g 166
' &
Examples of Neighborhood Systems
(a)
(b)
A clique C is de ned as C such that all pairs of sites in C are neighbors. Further, C denotes the set of all cliques.
167
'
and
Markov Random Fields (MRF)

p(z) > 0 for all z = !, p(z(xi ) j z(xj ) 8xj 6= xi) = p(z(xi ) j z(xj ) xj 2 Nxi ):
The random eld z= f z(x) g is an MRF with respect to N if
&
(In words, the rst condition implies all realizations have non-zero pdf, while the second states that the conditional pdf at a particular site depends only on its neighborhood.) Di culties with MRF models: i) the joint pdf p(z) cannot be easily related to local properties, and ii) it is hard to determine when a set of functions p(z(xi) j z(xj ) xj 2 Nxi ) xi 2 , are valid conditional pdfs Geman and Geman]. 168
'
where
Gibbs Random Fields (GRF)
A GRF with a neighborhood system N and the associated set of cliques C is characterized by the joint pdf discrete-valued X ;U (z=!)=T 1 (z ; !) p (z = ! ) = Q e
Q=
X
!
e;U (z=!)=T
continuous-valued
&
where
1 e;U (z)=T p(z) = Q
Q=
and U (z), the Gibbs potential (Gibbs energy) is de ned by X U (z) = VC (z(x) j x 2 C ):
C 2C
e;U (z)=T dz
169
'
Example: Spatial smoothness constraint using GRF Let us use a 4-point neighborhood system and the 2-pixel cliques. Over a 4 4 lattice, there are a total of 24 such cliques. Let the 2-pixel clique potential be de ned as
8 <; VC (z(xi ) z(xj )) = : +

2 2 2 2
if z (xi) = z (xj ) otherwise
where is a positive number.

2 2 2 2 2 2 2 2 2 2 2 2 1 2 1 2 2 1 2 1 1 2 1 2 2 1 2 1
&
24 two-pixel cliques (a)
V= -24 (b)
V = 24 (c)
Note that a lower potential means a higher probability. 170
' &
Equivalence of GRF and MRF

respect to N . The H-C theorem provides us with a simple and practical way to specify MRFs through the Gibbs potentials. In general, the MRF is speci ed in terms of local conditional pdfs. Note that, there is no general method to obtain the joint pdf of an MRF from the local conditional pdfs Besag]. The Gibbs distribution gives the joint pdf of z, which can be easily expressed in terms of the clique potentials which express the local interaction between pixels. They can be assigned arbitrarily.
Hammersley-Cli ord (H-C) Theorem: Let N be a neighborhood system. Then z (x) is an MRF with respect to N if and only if p(z) is a Gibbsian with
171
' &
where
Obtaining Local Conditional pdfs from Gibbs Potentials

(z) p(z(xi ) j z(xj ) 8xj 6= xi) = p(z(x p j ) xj 6= xi ) = P p(z) p(z) 8 xi 2 z(xi )2;
e.g., used in the Gibbs sampler method for optimization The local conditional pdf is de ned as,
After some algebra,

1 ; 1 ;T p(z(xi ) z(xj ) xj = xi ) = Qxi e
8 6
Cjxi 2C VC (z(x)jx2C )
Q xi =
X
z(xi )2;
1 ;T
Cjxi 2C VC (z(x)jx2C )
172
' &
OPTIMIZATION METHODS
Many estimation/segmentation problems require the minimization of an energy function E (d). We state the problem as ^ = mindE (d) E where d is some N -dimensional parameter vector. The value of d that results in the minimal E is denoted by ^ = arg (mind E (d)) d This minimization is exceedingly di cult for image processing applications due to both the dimensions of the vectors involved and the occurence of local minima because E (d) is usually nonconvex.
173
' &
Local vs. Global Minima
Gradient descent su ers from a serious problem: its solution is strongly dependent on the starting point. If start in a \valley", it will be stuck at the bottom of that valley. We have no way of getting out of that local minimum to reach the \global" minimum. Here we look at several optimization methods that are capable of nding the global optimum. A. Simulated annealing (stochastic relaxation)
Metropolis algorithm, Gibbs sampler (by Geman and Geman).
B. Iterative conditional mode (ICM) (by Besag) C. Mean eld annealing (MFA) (by Bilbro et al.)
174
' &
Simulated Annealing
Simulated annealing, sometimes refered to as stochastic relaxation, belongs to the class of Monte Carlo methods. It enables us to nd the global optimum of a nonconvex cost function of many variables. Here we describe two implementations, - the original formulation of Metropolis and - the Gibbs sampler proposed by Geman and Geman. The computational load of simulated annealing is usually signi cant especially when the number of elements in the unknown vector d and the number of values in the set ; are large.
175
' &
The Metropolis Algorithm
We start at an arbitrary initial vector d. At each iteration cycle, all components of d are perturbed one by one by assigning each another value in the set ; randomly. Note that the order in which the components are perturbed is not important, as long as all components are perturbed in each iteration cycle. The change in the total energy, E , due to the perturbation is computed after each perturbation to determine whether this perturbation is accepted. A perturbation is accepted with probability P given by
8 < exp(; E=T ) if E > 0 P =: 1 if E 0
where T is the temperature parameter that controls the probability of our accepting positive changes in the energy. We always accept perturbations that lower the energy. The rationale behind accepting perturbations that increase the energy is to prevent the solution from settling in a local minimum. 176
' &
If T is relatively big, the probability of accepting a positive energy change is higher than when T is small, given the same E . In the next iteration cycle, the temperature is lowered, and the components are revisited. The process continues until the temperature has been lowered to near zero. A temperature \schedule", expressing temperature as a function of the iteration number, is therefore an important component in the stochastic relaxation process. Geman and Geman proposed the following schedule
T = ln(k + 1)
where is a constant and k is the iteration cycle. This schedule is viewed as over conservative but guarantees a global minimum solution. Schedules that lower the temperature at a faster rate have been shown to work.
177
'
The Algorithm 1. Choose an initial value for d = d(0) . Set i = 0 and j = 1. 2. Perturb the j th component of d(i) to generate the vector d(i+1) . 3. Compute E = E (d(i+1) ) ; E (d(i) ). 4. Compute P from
8 < exp(; E=T ) if E > 0 P =: 1 if E 0
&
5. If P < 1, then draw a random number that is uniformly distributed between 0 and 1. If the number drawn is less than P accept the perturbation. 6. Set j = j + 1. If j N , go to 2. (N is the number of components of d). 7. Set i = i + 1 and j = 1. Reduce T according to a temperature schedule. If T > Tmin , go to 2. Otherwise terminate. 178
'
where
The Gibbs Sampler
In Gibbs sampling, instead of making random perturbations and then deciding whether to accept or reject this perturbation, the new value is \drawn from" the distribution of P (d) and is always accepted. First compute the conditional probability of the component d(xi ) to take each of the values in the set ; given the present values of its neighbors using
P (d(xi ) =
1 ; 1 ;T d(xj ) xj = xi ) = Qxi e
Cjxi 2C VC (d(x)jx2C )
&
Q xi =
X
2;
1 ;T
Then, the new value of the component d(xi ) is drawn from this conditional probability distribution. 179
'
To clarify the meaning of \drawn from", suppose that the sample space ; = f0 1 2 and 3g, and it was found that
P (d(xi) = 0 j d(xj ) P (d(xi) = 1 j d(xj ) P (d(xi) = 2 j d(xj ) P (d(xi) = 3 j d(xj )
xj 6= xi) xj 6= xi) xj 6= xi) xj 6= xi)
= = = =
0:2 0:1 0:4 and 0:3
&
A uniform random number, R, between 0 and 1 is generated. If 0 R 0:2 then d(xi) = 0, if 0:2 < R 0:3 then d(xi ) = 1, if 0:3 < R 0:7 then d(xi ) = 2, and if 0:7 < R 1 then d(xi ) = 3. Properties of perturbations through Gibbs sampling: (i) for any initial estimate, updating using the Gibbs sampler yields an asymptotically Gibbsian distribution. This result can be used to simulate a Gibbs random eld with speci ed parameters. (ii) for a speci ed temperature schedule, the maximum of the Gibbs distribution will be reached. Although this property is signi cant for MAP estimation, the speci ed temperature schedule may be too slow for use in practice. 180
' &
Iterated Conditional Modes (ICM)

ICM, also referred to as the greedy algorithm, is motivated by a need to reduce the computational load produced by stochastic relaxation or Gibbs sampling. Here, the sites are again visited one-by-one in some cyclic fashion, except there is no temperature change involved. The temperature T is set to zero, T = 0 for all iterations. Therefore, ICM is also refered to as the \instant freezing" case of simulated annealing. Refering to the equation of acceptance probability in SA, ICM only allows perturbations that provide negative E , since T = 0 e ectively gives a zero probability for accepting positive energy changes. Notice that due to this, solutions from ICM is likely to get trapped in local minima, and there is no guarantee that a global minimum can be reached.
181
'
where
It can be shown that ICM converges to the solution that maximizes the local conditional probabilities
P (d(xi ) =
1 ; 1 ;T d(xj ) xj = xi ) = Qxi e
Q xi =
X
2;
1 ;T
&
at each site. Thus, ICM is usually implemented as in Gibbs sampling but by choosing the value at each site that gives the maximum local conditional probability. ICM provides a much faster convergence than SA. Also, when the initial solution is a resonable estimate from other means rather than completely random, ICM reaches an acceptable solution in relatively few iterations. ICM produces good results for several applications that include image restoration see Besag] and image segmentation see Pappas]. 182
' &
Mean Field Annealing (MFA)

Mean eld annealing (MFA) originates from the \mean eld approximation" idea in statistical mechanics. The main idea is that in describing the interaction between a pixel and its neighbors, we use the mean values of the neighboring pixels. Thus, MFA is an approximation to simulated annealing, and it enables replacing the random search with a deterministic gradient descent. The implementation of MFA is not unique. Details can be found in the references. In particular, Snyder] is a good tutorial on several optimization methods. Other references: Geman and Geman] discusses the GRF/MRF equivalence. Vigorous treatment of the statistical formulations can be found in Besag] and Spitzer]. 183
'
Let
1
BAYESIAN MOTION ESTIMATION
sk = fsk (x)g x 2 , denote the kth frame of video, d(x) = d (x) d (x)]T denote the displacement vector at site x, and d = fd (x)g and d = fd (x)g for x 2 , denote the lexicographic ordering of
1 2 1 2 2
&
the x1 and x2 components of the displacement eld from frame k ; 1 to k, respectively i.e., sk (x) = sk;1 (x ; d(x)): Then, the problem of motion estimation can be formulated as: given sk and sk;1 , nd an estimate of d1 and d2. The maximum a posteriori probability (MAP) estimates of d1 and d2 are given by: ^1 d ^ 2 ) = arg maxd1 d2 p(d1 d2jsk sk;1) (d 184
'
or
From Bayes formula
sk;1)p(d1 d2jsk;1) p(d1 d2jsk sk;1) = p(sk jd1 d2p( sk jsk;1)
Since the denominator is not a function of d1 and d2 ,
^1 d ^ 2 ) = arg maxd1 d2 p(sk jd1 d2 sk;1 )p(d1 d2jsk;1) (d ^1 d ^ 2) = arg maxd1 d2 p(sk;1 jd1 d2 sk )p(d1 d2jsk ) (d
&
The term p(sk jd1 d2 sk;1 ) is the conditional pdf, or the \consistency (likelihood) measure", that measures how well the estimates of d1 d2 explain the observations sk given sk;1 . The term p(d1 d2 jsk;1 ) is the a priori probability density that is modeled by a GRF, by specifying the clique potential functions according to the desired local ^1 d ^ 2). properties of (d 185
' &
Discontinuity Models
Let us introduce two auxilary elds, the occlusion eld o, and the line eld l to model the occlusion/uncovered areas, and the optical ow boundaries respectively, in order to improve the motion estimation results. The occlusion eld o = o(x) x 2 ,
8 < 0 d(x) is well de ned o(x) = : 1 x is an occlusion point

The line eld The a priori pdf p(d1 d2 jsk;1 ) is usually chosen to favor a globally smooth motion eld. To allow for the presence of discontinuities in the motion eld, we make use of the line process.
186
' &
The line eld l(xi xj ) models the horizontal and vertical discontinuities in the motion eld (optical ow) between the sites xi and xj as
8 > <1 l(xi xj ) = > :0
if there is a discontinuity between d(xi)and d(xj ) otherwise.
The line process, l conceptually occupies the dual lattice which has sites for lines between every pair of pixel sites. The state of each line site can be either ON (l = 1) or OFF (l = 0), expressing the presence and absence of a discontinuity, respectively. Nonnegative potentials are assigned to each rotation invariant line clique con guration to penalize excessive use of the \ON" state.
187
'
Example: Line Process Clique Potentials
a)
b)
V = 0.0
V = 2.7
V = 0.9
&
V = 1.8
V = 1.8 c)
V = 2.7
An image with 4 4 pixel sites has 9 distinct 4-line cliques. 188
' &
Example: Prior probabalities with and without the line eld The prior potentials slightly penalize straight lines (V = 0:9), penalize corners (V = 1:8) and \T" junctions (V = 1:8), and heavily penalize end of a line (V = 2:7) and \crosses" (V = 2:7).
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 1
The likelihood potential function puts no penalty on dissimilar pixel pairs if the line site in between is ON, and puts di erent amounts of penalty on di erent line con gurations, re ecting our a priori expectation of their occurence.
189
' &
With the introduction of the auxiliary elds, the MAP estimate of fd1 d2 o lg is given by: ^ fd ^ fd
1
^ o ^^ d lg = arg maxd1 d2 o l p(d d o ljsk sk; )

2 1 2 1
Using the Bayes rule, and the symmetry of the expression

1
^ o ^^ d lg = arg maxd1 d2 o l p(sk; jd d o l sk )p(d d o ljsk )

2 1 1 2 1 2
Next, we discuss the likelihood (consistency) and the a priori probability models.
190
' &
The Likelihood Model

Assuming -the change in the illumination from frame to frame is insigni cant, and - that there is no occlusion/uncovered areas, the change in the intensity of a pixel along the motion trajectory is due to observation noise. Modeling the observation noise as white, Gaussian, we have
( X ) (x ; d(x))) p(sk; jd d l sk ) = C exp ; (sk (x) ; sk; 2

1 1 2 1 2
x2
where C is some constant.
191
'
where
Now, taking the occlusion points into account
p(sk;1jd1 d2 o l sk ) = # " X 2 ; sk;1(x ; d(x))) C exp ; (1 ; o(x))(sk (x)2 2 x2

This pdf can be expressed more compactly in terms of an \energy function"
p(sk;1jd1 d2 o l sk ) = C exp ;U (sk jd1 d2 o sk;1 )]
&
X 1 U (sk jd1 d2 o sk;1) = 2 2 (1 ; o(x)) (sk (x) ; sk;1(x ; d(x)))2 x2
192
'
where
The Prior Model
The prior model incorporates the location of the optical ow boundaries and the occlusion/uncovered areas while dictating that the ow vectors vary smoothly within each optical ow boundary. The a priori model can be expressed as
p(d1 d2 o ljsk ) = exp ;U (d1 d2 o ljsk )] U (d1 d2 o ljsk ) = dU (d1 d2jl) + s U (ojl) + l U (ljo) X X X = d Vc(d1 d2jl) + s Vc(ojl) + l Vc(ljsk )
&
c2Cd
c2Co
c2Cl
Here Cd , Co and Cl denote the sets of all cliques for the displacement, occlusion and line elds, respectively, Vc (:) represent the corresponding clique function, and d , o and l are positive constants. 193
' &
ESTIMATION ALGORITHMS
The minimization of the overall potential is an exceedingly di cult problem, there are several hundreds of thousands of unknowns for a reasonable size image, and the criterion function is nonconvex. For example, for a 256 256 image, there are 65,536 motion vectors (131,072 components), 65,536 occlusion labels, and 131,072 line eld labels for a total of 327,680 unknowns. An additional complication is that the motion vector components are continuous-valued, and the occlusion and line eld labels are discrete-valued.
194
'
Three-step iteration of Dubois and Konrad: ^ and ^ 1. Given the best estimates of the auxilary eld o l, update the motion eld dk by minimizing ^ ^ min U ( g d d o g ) + U ( d d l gk ) g k k ; d d d1 d2
1 2 1 1 2
This minimization can be done by Gauss-Newton optimization.
^ 1, d ^ 2 and ^ 2. Given the best estimates of d l, update o by minimizing ^1 d ^ 2 o gk;1 ) + oUo (o ^ min U ( g d l gk ) g k o An exhaustive search or the ICM method can be employed to solve this step. ^ 1, d ^ 2 and o ^, update l by minimizing 3. Finally, given the best estimates of d ^1 d ^ 2 l gk;1 ) + o Uo (^ min U ( d o l gk ) + l Ul (l gk ) d d l Once all three elds are updated, the process is repeated until a suitable criterion of convergence is satis ed. This procedure has been reported to give good results. 195
&
' &
LECTURE 9 MOTION SEGMENTATION

1. Basics of Segmentation - Thresholding / Clustering / MAP Segmentation 2. Foreground/Background Separation 3. Dominant Motion vs. Parametric Clustering Methods 4. Direct Methods vs. Optical Flow Segmentation 5. Simultaneous MAP Motion Estimation and Segmentation 6. Integration of Color and Motion Segmentation
196
' &
WHY OBJECT/MOTION SEGMENTATION?

Help improve optical ow estimation with multiple motion Help improve 3-D motion and structure estimation Object-based video coding Object-based editing (synthetic trans guration)
197
' &
Image vs. Optical Flow Segmentation

e.g., image segmentation usually refers to segmentation based upon the grayscale (or color) of pixels.
Segmentation is based on a feature (vector).
Application of standard image segmentation methods directly to optical ow segmentation (i.e., using the velocity vector as feature) may not be useful, since 3-D motion usually generates spatially varying optical ow elds. e.g., within a purely rotating object, there is no ow at the center of rotation and the magnitude of the ow vectors increase as the distance of the points from the center of rotation increase.
Thus, optical ow segmentation needs to be based on some parametric description of the motion eld. 198
' &
2-D Optical Flow Estimation and Segmentation

A realistic scene generally contains multiple motion. Smoothness constraints cannot be imposed across motion boundaries.
Background Calendar
Ball
Train
199
'
3-D Motion/Structure Estimation and Segmentation
Assume that the object surface is composed of planar patches. aX1 + bX2 + cX3 = 1 The 3-D rigid motion of the object is modeled as 2 3 2 3 X1 X1 6 7 6 7 6 7 6 = R 4 X2 5 4 X2 7 5+T X3 X3
0 0 0
Then,
&
2 3 2 32 3 X1 a1 a2 a3 X1 6 7 6 7 6 7 6 7 6 7 6 4 X2 5 = 4 a4 a5 a6 5 4 X2 7 5
0 0
where
X3
0
a7 a8 a9
X3
A = R+T a b c ]
200
' &
Scene Segmentation
x1 = a1x1 + a2x2 + a3 x2 = a4x1 + a5x2 + a6
0 0
Orthographic projection of the object coordinates into the image plane yields
Perspective projection of the object coordinates into the image plane yields 1 + a2 x2 + a3 x1 = aa1x 7 x1 + a8 x2 + 1 1 + a5 x2 + a6 x2 = aa4x 7 x1 + a8 x2 + 1
0 0
Assuming the scene is represented by a 3-D mesh (wireframe) model with planar patches, di erent parametric models are needed for { Di erent moving objects, which have di erent set of 3-D rigid motion parameters. { Di erent planar patches, which have di erent normal vectors. 201
' &
Thresholding
Consider a bi-modal histogram h(s) of an image, s(x1 x2 ), composed of a light object on a dark background.
h(s)
s s min T s max
To extract the object from the background select a threshold T that separates these two dominant modes (peaks) 8 < 1 if s(x1 x2 ) > T z (x 1 x 2 ) = : 0 otherwise. indicates the object and background pixels. 202
'
Multilevel Thresholding If the histogram has M signi cant modes (peaks), where M > 2, then we need M ; 1 thresholds to separate the image into M segments. Of course, reliable determination of the thresholds becomes more di cult as the number of modes increases. Global/Local/Dynamic Thresholding In general, the threshold T is a function of
T = T (x1 x2 s(x1 x2) p(x1 x2))

where (x1 x2 ) are the coordinates of a point, s(x1 x2) is the intensity of the point, and p(x1 x2) is some local property of the point, such as the average intensity of a local neighborhood. If T depends only on s(x1 x2), it is called a global threshold. If T depends on both s(x1 x2 ) and p(x1 x2 ), it is a local threshold. If, in addition, it depends on (x1 x2 ), it is called a dynamic threshold. Methods for determining the threshold(s) are discussed in Gonzalez and Wintz. 203
&
' &
Clustering via the K-Means Algorithm
Suppose we wish to segment an image into K regions based on the gray-values of the pixels. Let x = (x1 x2 ) denote the coordinates of a pixel, and s(x) denote its grey level.
K = 2, M=1 s 1 2
The K-means method of clustering minimizes the performance index 3 2 K X X 6 J= 4 jjs(x) ; (i+1) jj27 5
k=1 x2
(i) k
204
'
The K-means Algorithm:

(1) (1) 1. Choose K initial cluster centers, (1) 1 , 2 , ..., K . 2. At the i'th iteration distribute the pixels, x, among the K clusters using the relation
i) for all k = 1 2 : : : K , k 6= j , where ( k denotes the set of samples whose cluster i) center is ( k . i+1) 3. Compute the new cluster centers ( , k = 1 2 : : : K as the sample mean of k i) all samples in ( k X 1 (i+1) s(x) k = 1 2 : : : K =N k k
x2
(i)
if jjs(x) ;
(i)
jj < jjs(x) ; (ki) jj
&
x2
(i) k
i) where Nk is the number of samples in ( k . i+1) i) 4. If ( = ( k k for all k = 1 2 : : : K , the algorithm has converged, and the procedure is terminated. Otherwise, go to step 2.
205
' &
MAP Segmentation
Clustering with Spatial Smoothness Constraints Let z(x) denote the segmentation label at the pixel x, i.e., 1 z (x) K , and s(x) denote the grey level of the pixel. De ne z and s to denote the lexicographic ordering of the segmentation label
eld and the grey level eld, respectively. The maximum a posteriori probability (MAP) estimate of the segmentation label eld maximizes the a posteriori probability of the segmentation labels given the pixel gray levels where p(s j z) is the conditional probability density of the image grey levels given the pixel labels and p(z) is the prior density of the segmentation labels. 206
p(zjs) / p(s j z)p(z)
'
&
The prior pdf of the segmentation labels is modeled by an GRF ( ) X X 1 p(z) = Q exp ; VC (z) (z ; !) ! C where Q is the partition function (normalizing constant) and the summation is over all cliques C. We consider only one and two point cliques. The single pixel clique potentials are de ned as VC (z(x)) = i if z(x) = i and x 2 C all i They re ect our a priori knowledge of the probabilities of di erent region types. The smaller i the higher the likelihood of region i. The two-point clique potentials are de ned as 8 <; if z (x1 ) = z(x2) and x1 x2 2 C VC (z(x1) z(x2)) = : if z(x1 ) 6= z(x2) and x1 x2 2 C where is a positive parameter so that two neighboring pixels are more likely to belong to the same class than to di erent classes. The larger the value of , the stronger the smoothness constraint.
2
The A Priori Probability Density
207
' &
The conditional density for region k is modeled as a white Gaussian process, with mean k and variance 2. Thus, the a posteriori density has the form ( ) X 1 X p(zjs) / exp ; 2 2 s(x) ; z(x) 2 ; VC (z)
x C
The Conditional Probability Density
Maximization of this a posteriori density function with respect to z can be performed by simulated annealing. Observe that if we turn o the spatial smoothness constraints, the result is identical to the K-means algorithm.
208
' &
Adaptive MAP Method

k
The MAP method can be made adaptive by letting the cluster means vary with the pixel location x. Then, ( ) X p(sjz) / exp ; (s(x) ; z(x) (x))2=2 2
x
2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 1 2 1 2 2 2 2 2 1 2 2 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 Segmentation labels, K=2 Local window
slowly
The quantities k (x) are estimated at each site x for all k = 1 : : : K , as the sample mean of those pixels with label k within a local window about the pixel x. 209
' &
To reduce the computational load down to a reasonable level, 1) the space-varying mean estimates will be computed on a sparse grid, and then interpolated. 2) the optimization will be performed via the ICM method. The algorithm starts with a window size equal to the image size and reduce the size of the window by 4 after each ICM optimization cycle. The ICM is equivalent to maximizing the local a posteriori pdf
Computational Issues
p(z(xi )js(x) z(xj ) all xj 2 Nx ) 8 9 < 1 = X / exp :; 2 2 (s(x) ; z(x) (x))2 ; VC (z) Cx C
i
j 2
Ref: T. N. Pappas, \An Adaptive Clustering Algorithm for Image Segmentation," IEEE Trans. on Signal Proc., vol. SP-40, pp. 901-914, April 1992.
210
' &
Multi-Channel Segmentation
p(zjy) / p(yjz)p(z)
Let y(x) = (v1(x) v2(x) s(x)). Assign a single label z (x) to each element of y(x) to maximize Assuming v1, v2, and s are conditionally independent given z,
p(v1 v2 sjz) = p(v1jz)p(v2jz)p(sjz)

(
1
which results in
X 1 2 p(v1 v2 sjz) = exp ; 2 2 (v1(x) ; v z(x) (x)) + 1 x 1 (v (x) ; v (x))2 + 1 (s(x) ; 2 2 z(x) 2 2 2 2 3
2
s (x))2 z(x)
The prior pdf for s is a Gibbs distribution with a 4-pixel neighborhood system and 2-pixel cliques. 211
' &
CHANGE DETECTION
FD(k k 1) (x1 x2) = s(x1 x2 k) ; s(x1 x2 k ; 1)
;
Compare two images pixel by pixel by forming a di erence image Segment the scene into moving vs. stationary parts by thresholding the di erence image 8 < 1 ifjFD(k k 1) (x1 x2 )j > T z(x1 x2) = : 0 otherwise.
;
where T is an appropriate threshold. This approach assumes that the illumination remains more or less constant from frame to frame. This method may result in isolated 1s in the segmentation mask z (x1 x2 ) due to noise in the images. 212
' &
Accumulative Di erences To eliminate sporadic \1"s in the segmentation mask, we may consider adding memory to the motion detection process by forming accumulative di erence images. Let s(x1 x2 k), s(x1 x2 k ; 1), , s(x1 x2 k ; n) be a sequence of images, and let s(x1 x2 k) be the reference frame. An accumulative di erence image is formed by comparing this reference image with every subsequent image in the sequence. A counter for each pixel location in the accumulative image is incremented every time the di erence between the reference image and the next image in the sequence at that pixel location is bigger than the threshold.
213
' &
MOTION SEGMENTATION METHODS

Dominant motion approach (Diehl, Hotter and Thoma, Bergen et al., Burt et al., Irani et al.) Parameter clustering approach (Adiv, Wang and Adelson) Simultaneous Bayesian estimation and segmentation (Chang, et al.) Region-based approach using color information (Eren, et al.)
214
' &
DOMINANT MOTION APPROACH

Compute the dominant 2-D translation in the entire region of analysis. Segment the region that corresponds to the computed motion by detecting \stationary pixels" in the registered images. Employ a higher-order (a ne, perspective) model within this region for improved motion estimation. Iterate steps 2-4 until convergence. Proceed to the next dominant object by excluding the support of previously computed dominant objects.
215
' &
A Direct Method
i) Parametric modeling of the 2-D motion eld De ne a transform with a set of parameters that maps pixels from frame k to frame k+1. Estimate the parameters of this transform in the image domain. ii) Segmentation Regions undergoing the same 3-D motion would have the same set of mapping parameters. Thus, assign ow vectors having the same mapping parameters into the same class. The process iterates between parameter estimation and segmentation until a satisfactory result is obtained.
216
'
Let
Parametric Modeling of the 2-D Motion Field

gk (x) = sk (x) + nk (x) gk+1 (x) = (1 + )sk+1(x ) + + nk+1 (x)
0
where and describe global illumination changes, and nk (x) denotes the noise. Assuming no occlusion e ects,
sk+1 (x ) = sk (x)
0
The transformation from the coordinate systems x to x is given by

0
x = h(x )
0
&
where
is a parameter vector. The form of h(x ) depends on:
1) The 3-D motion of the object. 2) The projection model from the 3-D space onto the camera plane. 3) The model of the object surface (planar, quadratice, etc.) 217
'
0
Examples of Coordinate Transforms
1) Planar surface, perspective projection: Let x and x denote image plane coordinates under the perspective projection. Assume that the surface of the moving object is planar, X3 = aX1 + bX2 + c. Then, the transformation is given by 1 + a2 x2 + a3 x1 = aa1x 7 x1 + a8 x2 + 1 4 x1 + a5 x2 + a6 x2 = a a x +a x +1
0 0
7 1
8 2
where = (a1 a8) is the vector of mapping parameters. 2) Planar surface, orthographic projection: In the case of parallel (orthographic) projection, we have the a ne transform
&
where
x1 = c1x1 + c2x2 + c3 x2 = c4x1 + c5x2 + c6

0 0
= (c1
c6) is the vector of mapping parameters.

218
'
3) Quadratic surface, orthographic projection: Let the surface be characterized by

2 2 X3 = a11X1 + a12X1 X2 + a22X2 + a13X1 + a23X2 + a33
and the equations
x1 = mX1 x1 = m X1
0 0 0
x2 = mX2 x2 = m X2
0 0 0
describe the parallel projection. Substituting these into the 3-D displacement model and grouping terms with the same exponent, we arrive at the 12-parameter quadratic transform
&
2 x1 = a 1 x 2 1 + a2 x2 + a3 x1 x2 + a4 x1 + a5 x2 + a6 2 x2 = b1x2 1 + b2 x2 + b3 x1 x2 + b4 x1 + b5 x2 + b6
0 0
219
' &
Remarks: The quadratic transform is generally used in optical ow segmentation and object-oriented description, because it provides a good approximation to many real life images. It is not always possible to completely determine the 3-D motion of the object and the explicit surface structure using only the mapping parameters of the transform h(x ). But for image coding applications this does not pose a serious problem, since the main interest is the prediction of the next frame from the current frame. The mapping approach that is presented is not capable of handling occlusion e ects.
220
' &
Linear algorithms exist to nd the mapping parameters given spatio-temporal intensity gradients. The contents of the images sk (x) and sk+1(x) must be su ciently similar for estimation to be successful.
We estimate the mapping parameters to minimize the error function n o 1 2 J ( ^ ) = 2 E (~ sk+1(x ^ ) ; sk+1 (x)) where s ~k+1 (x ^ ) denotes the prediction of frame k + 1 from frame k.
Algorithms for Mapping Parameter Estimation
221
' &
Each object is characterized by a speci c mapping vector . Thus, segmentation and motion estimation are treated as a combined problem. - In the rst step, the regions which have changed between sk (x) and sk+1 (x) are determined (change detection). - All isolated connected-regions of the resulting segmentation are de ned as objects of hierarchy level one. For each of these objects, a parameter vector of a transform h(x ) which relates the two images is estimated. - Next, those regions of each object where the vector is not valid are removed. These regions are de ned as objects of the second hierarchical level. - For the objects of level two and the remaining parts of level one, the parameter vectors are estimated. - Repeat the procedure, until the parameter vectors for each region are consistent with the region. 222
Segmentation Based on Mapping Parameters
' &
PARAMETER CLUSTERING APPROACH
Dense motion estimation (hierarchical, 3-step Lucas-Kanade) Start with randomly selected seed blocks (initial regions), estimate a ne parameters over each block. Merge regions with \similar" a ne parameters to reduce the number of classes. Update regions by classifying each pixel to one of the motion classes based on similarity of the dense and the corresponding a ne motion vectors, where a \good" match can be found. Reestimate a ne parameters over the updated regions, and iterate until convergence Classify all \unassigned pixels" based on a DFD criterion. 223
' &
Optical Flow Segmentation

Problem Statement: Segment a scene into independently moving objects. Feature Selection: - cannot use 2-D motion vectors since in most cases motion vectors do vary within a single 3-D moving object, e.g., rotation. - use the underlying 3-D motion parameters of the objects. An Application: Layered video representation
224
' &
CLUSTERING METHODS
1. Estimate the optical ow eld. 2. Divide the motion eld into rectangular blocks.
3. For each block, estimate the a ne parameters by the method of linear least
squares. 4. Threshold the motion residual by Tstage to determine reliable blocks.
225
' &
5. Apply the merge procedure to nd the a ne models to be used in pixel
assignment. 6. Find the pixels that fall into the computed cluster using the velocity checking criterion. 7. Delete all the assigned pixels from the image so that they will not be used in the next stage. 8. Eliminate small regions from the map obtained in step 7. 9. If all the pixels are assigned then stop, otherwise go to step 4.
226
' &
MAP SEGMENTATION
Maximize the a posteriori pdf of the label eld
v2jz)p(z) p(zjv1 v2) = p(vp1(v 1 v2 ) given the optical ow data, where p(v1 v2jz) is the conditional pdf of the optical ow data given the segmentation and p(z) is the prior probability of the segmentation. 1) The segmentation eld is modeled by a spatio-temporal Markov random eld (MRF) to impose continuity (smoothness) of labels. 2) The conditional pdf models how well we can predict the measured (estimated) optical ow eld.
Ref. Murray and Buxton.
227
'
where
The Conditional Probability In the presence of noise n, the joint probability of the data given the segmentation labels is related to the noise distribution Pn (n) by
p(v1 v2jz) = Pn (n)
Assuming that the noise is white, Gaussian, with zero mean and variance 2, ) ( X 1 1 2 (x) exp ; Pn(n) = (2 2)1 =(2d( )) 2 2x
2
&
~ (x)jj2 (x) = jjv(x) ; v
which depends on the way the optic ow data are distributed among the various scene facets. 228
' &
The prior probability of the interpretation is modeled by an MRF with respect to some local neighborhood. Thus, it is given by a Gibbs distribution which e ectively introduces local constraints on the interpretation. X 1 p(z) = Q exp f;U (z)g (z ; !) !
2
The Prior Probability
where Q is the partition function X Q = exp f;U (!)g

!2
and U (!) is the sum of local potentials. Taking the logarithm of the MAP criterion, the maximization of the a posteriori probability distribution becomes minimization of the cost function 1 X 2(x) + U (z) 2 2x
2
229
'
The Algorithm:
1. Start with an initial labeling z of the optical ow vectors. Calculate the mapping parameters a = a1 a8]T for each region using least squares tting. Set the initial temperature for SA. 2. Scan the pixel sites according to a prede ned convention. At each site xi: (a) Perturb the label zi , randomly. (b) Decide whether to accept or reject this perturbation, based on the change in the cost function X 1 2 VC (z(xi ) z(xj )) C = 2 2 (x) +
xj 2Nxi
&
3. After all pixel sites are visited once, re-estimate the mapping parameters for each region in the least squares sense based on the new segmentation label con guration. 4. Exit, if a stopping criterion is satis ed. Otherwise, lower the temperature according to the schedule, and go to step (2). 230
'
where and
The spatial and temporal continuity of the segmentation labels can be enforced by means of spatial and temporal Gibbs potential functions, where X X X X X U= V2s(z(xi ) z(xj ) Lij ) + V; (L) + V2t(z(xi ) z(xk ))
xi xj 2Nxi
;
Potential Functions for the Prior Model
xi xk 2Nxi
8 > > < ;as if z (xi ) = z (xj ) and Lij is OFF V2s(z(xi ) z(xj ) Lij ) = > as if z(xi) 6= z(xj ) and Lij is OFF > : 0 if Lij is ON 8 < ;at if z (xi ) = z (xk ) V2t(z(xi) z(xk )) = : at otherwise
&
Here as and at are positive parameters which control the strength of the spatial and temporal continuity constraints, respectively. 231
' &
Simultaneous Motion Estimation and Segmentation

The optical ow segmentation methods are limited by the accuracy of the available optical ow estimates. Combine motion estimation and segmentation under a single MAP estimation framework in a mutually bene cial way. The posterior probability
z)p(v1 v2jz gk )p(zjgk ) p(v1 v2 zjgk gk+1 ) = p(gk+1 jgk v1 vp2(g k+1 jgk )
p(gk+1 jgk v1 v2 z) is characterized by the DFD, modeled by a Gaussian distribution. p(zjgk ) is modeled as Gibbsian for connected regions.
232
'
p(v1 v2jz gk ) relates the 2-D motion estimates to the 3-D scene 1 exp f;U (v v jz)g p(v1 v2jz gk ) = p(v1 v2jz) = Q 1 2 where U (v1 v2jz) =
X X
xi xj 2Nxi
jjv(xi) ; v(xj )jj2 (z(xi) ; z(xj ))

X
x
~ (x)jj2 jjv(x) ; v
Maximizing the a posteriori pdf is equivalent to minimizing the cost function,
&
C = U (gk+1 j gk v1 v2 z) + U (v1 v2 j z) + U (z)
The minimization is performed in two steps, alternating between estimation of optical ow, estimation of the model parameters and update of segmentation labels. 233
'
1. Estimate the optical ow eld (v1 v2) assuming that the segmentation eld z is given. This step involves the minimization of a modi ed cost function X 2 X ~ (x)jj2 C1 = jjv(x) ; v v v (x) + x x X X + jjv(xi) ; v(xj )jj2 (z(xi) ; z(xj )):
1 2
xi xj 2Nxi
&
which is composed of all the terms in C that contain (v1 v2). While the rst term indicates how well v explains our observations, the second and third terms impose prior constraints on the motion estimates that they should conform with the parametric ow model, and that they should vary smoothly within each region. The algorithm is initialized with an optical ow eld that is estimated using a global smoothness constraint. Given this estimate, we initialize the segmentation labels using a procedure similar to Wang and Adelson. 234
'
2. Estimate the segmentation eld z, assuming the optical ow vectors (v1 v2 ) are given. This step involves the minimization of all terms in C that contain z as well as (v1 v2 ), the projection of the 3-D motion. The modi ed cost function is given by X 2 X C2 = jjv(x) ; v (x)jj2 v v (x) + x x X X + V2(z(xi ) z(xj )):
0 0
xi xj 2Nxi
&
The rst term quanti es how well the projected motion (v1 v2), which depends on z and , compensates for the motion. The second term measures the consistency of (v1 v2 ) with (v1 v2). The third term is related to the prior probability of the present con guration of the segmentation labels. This step includes the least squares estimation of the mapping parameters a. A hierarchical implementation of this algorithm is also possible by forming successive low-pass ltered versions of gk and gk+1 .
0 0 0 0
235
' &
Flowchart
Input video 2-D dense motion estimation
(e.g., Lucas-Kanade)
Update motion field given segmentation

(Chang, Tekalp, Sezan)
Multi-stage parametric motion segmentation

(ext. of Wang-Adelson)
Update segmentation given motion field

(Chang, Tekalp, Sezan)
Go to next frame
Updates are based on the MAP criterion using Gibbsian priors. 236
' &
Integration of Color and Motion Segmentation

4 1 A B 2 3
Perform pixel-based motion segmentation (dotted line) to determine the number of motion classes, and the parametric model for each class. Perform color segmentation to de ne regions bounded by edges (solid lines). Assign each color region into one of the motion classes based either on the motion criterion, DFD criterion, or a combination of them. 237

Digital Video Processing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Video Processing

Uploaded by

Copyright:

Available Formats

DIGITAL VIDEO PROCESSING

EE 449 Spring 1997 A. Murat Tekalp

PART 2: MOTION ANALYSIS

PART 4: STILL-IMAGE COMPRESSION

PART 5: VIDEO COMPRESSION