You are on page 1of 70

Lecture 17:

object detection
Professor Fei-Fei Li
Stanford Vision Lab

Fei-Fei Li

Lecture 17 - 1

18-Nov-11

Object detection

Fei-Fei Li

Lecture 17 - 2

18-Nov-11

What we will learn today?


Implicit Shape Model
Representation
Recognition
Experiments and results

Deformable Models
The PASCAL challenge
Latent SVM Model

Fei-Fei Li

Lecture 17 - 3

18-Nov-11

What we will learn today?


Implicit Shape Model
Representation
Recognition
Experiments and results

Deformable Models
The PASCAL challenge
Latent SVM Model

Fei-Fei Li

Lecture 17 - 4

18-Nov-11

Implicit Shape Model (ISM)


Basic ideas

x1

Learn an appearance codebook


Learn a star-topology structural model

x6

x2

x5

x3
x4

Features are considered independent given obj. center

Algorithm: probabilistic Gen. Hough Transform

Exact correspondences
NN matching
Feature location on obj.
Uniform votes
Quantized Hough array

Prob. match to object part


Soft matching
Part location distribution
Probabilistic vote weighting
Continuous Hough space

Source: Bastian Leibe


Fei-Fei Li

Lecture 17 - 5

18-Nov-11

Implicit Shape Model: Basic Idea


Visual vocabulary is used to index votes for object
position [a visual word = part].

Visual codeword with


displacement vectors
Training image
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 6

18-Nov-11

Implicit Shape Model: Basic Idea


Objects are detected as consistent configurations of
the observed parts (visual words).

Test image
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 7

18-Nov-11

Implicit Shape Model - Representation

Training images
(+reference segmentation)

Learn appearance codebook

Appearance codebook
y

Extract local features at interest points


Agglomerative clustering codebook

s
x

Learn spatial distributions


Match codebook to training images
Record matching positions on object

s
x

Source: Bastian Leibe


Fei-Fei Li

Spatial occurrence distributions


+ local figure-ground labels

Lecture 17 - 8

18-Nov-11

Implicit Shape Model - Recognition


Interest Points

Matched Codebook
Entries

Probabilistic
Voting

y
[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Image Feature

Interpretation
(Codebook match)

Object
Position
s

o,x

Ci

p(Ci f )

3D Voting Space
(continuous)

p(on , x Ci , l)

Probabilistic vote weighting


(will be explained later in detail)
Fei-Fei Li

Lecture 17 - 9

18-Nov-11

Implicit Shape Model - Recognition


Interest Points

Matched Codebook
Entries

Probabilistic
Voting

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

3D Voting Space
(continuous)

Backprojected
Hypotheses
Fei-Fei Li

Backprojection
of Maxima

Lecture 17 - 10

18-Nov-11

Example: Results on Cows

Original image
Fei-Fei Li

Lecture 17 - 11

Source: Bastian Leibe

18-Nov-11

Example: Results on Cows

Interest
Originalpoints
image
Fei-Fei Li

Lecture 17 - 12

Source: Bastian Leibe

18-Nov-11

Example: Results on Cows

Interest
Originalpoints
image
Matched
patches
Fei-Fei Li

Lecture 17 - 13

Source: Bastian Leibe

18-Nov-11

Example: Results on Cows

Prob. Votes
Fei-Fei Li

Lecture 17 - 14

Source: Bastian Leibe

18-Nov-11

Source: K. Grauman & B. Leibe

Example: Results on Cows

1st hypothesis
Fei-Fei Li

Lecture 17 - 15

18-Nov-11

Example: Results on Cows

2nd hypothesis
Fei-Fei Li

Lecture 17 - 16

Source: Bastian Leibe

18-Nov-11

Example: Results on Cows

3rd hypothesis
Fei-Fei Li

Lecture 17 - 17

Source: Bastian Leibe

18-Nov-11

Scale Invariant Voting


Scale-invariant feature selection
Scale-invariant interest points
Rescale extracted patches
Match to constant-size codebook

Generate scale votes


Scale as 3rd dimension in voting space

Search for maxima in 3D voting space

Search
window
y
x
Source: Bastian Leibe

Fei-Fei Li

Lecture 17 - 18

18-Nov-11

Scale Voting: Efficient Computation


s

Scale votes

Binned
accum. array

Candidate
maxima

Refinement
(Mean-Shift)

Continuous Generalized Hough Transform


Binned accumulator array similar to standard Gen. Hough Transf.
Quickly identify candidate maxima locations
Refine locations by Mean-Shift search only around those points
Avoid quantization effects by keeping exact vote locations.
Mean-shift interpretation as kernel prob. density estimation.
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 19

18-Nov-11

Scale Voting: Efficient Computation


s

Scale votes

Binned
accum. array

Candidate
maxima

Refinement
(Mean-Shift)

Scale-adaptive Mean-Shift search for refinement


Increase search window size with hypothesis scale
Scale-adaptive balloon density estimator
This image cannot currently be display ed.

Source: Bastian Leibe


Fei-Fei Li

Lecture 17 - 20

18-Nov-11

Detection Results
Qualitative Performance
Recognizes different kinds of objects
Robust to clutter, occlusion, noise, low contrast

Source: Bastian Leibe

Lecture 17 -

21

Figure-Ground Segregation
What happens first segmentation or recognition?
Problem extensively studied in
Psychophysics
Experiments with ambiguous
figure-ground stimuli
Results:
Evidence that object recognition can
and does operate before figure-ground
organization
Interpreted as Gestalt cue familiarity.
M.A. Peterson, Object Recognition Processes Can and Do Operate Before FigureGround Organization, Cur. Dir. in Psych. Sc., 3:105-111, 1994.
Fei-Fei Li

Lecture 17 - 22

18-Nov-11

ISM Top-Down Segmentation


Interest Points

Matched Codebook
Entries

Probabilistic
Voting

Segmentation

p(figure)
Probabilities

3D Voting Space
(continuous)

Backprojected
Hypotheses

Backprojection
of Maxima
[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Fei-Fei Li

Lecture 17 - 23

18-Nov-11

Top-Down Segmentation: Motivation

Secondary hypotheses (mixtures of cars/cows/etc.)


Desired property of algorithm! robustness to occlusion
Standard solution: reject based on bounding box overlap
Problematic - may lead to missing detections!
Use segmentations to resolve ambiguities instead.
Basic idea: each observed pixel can only be explained by
(at most) one detection.
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 24

18-Nov-11

Top-Down Segmentation: Motivation

Secondary hypotheses (mixtures of cars/cows/etc.)


Desired property of algorithm! robustness to occlusion
Standard solution: reject based on bounding box overlap
Problematic - may lead to missing detections!
Use segmentations to resolve ambiguities instead.
Basic idea: each observed pixel can only be explained by
(at most) one detection.
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 25

18-Nov-11

Segmentation: Probabilistic Formulation

Influence of patch on object hypothesis (vote weight)


p(o , x | C ) p(C

p( f , l o , x ) =
i

p(on , x )

| f ) p( f,l )

Backprojection to features f and pixels p:


p(p = figure | on , x ) =

p(p = figure | f , l, o , x ) p( f , l | o , x )
n

p( f ,l )

Segmentation
information

Influence on
object hypothesis

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Fei-Fei Li

Lecture 17 - 26

18-Nov-11

Derivation: ISM Recognition


Algorithm stages

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Vote weights: contribution of a single feature f


Image Feature f Codebook matches Object location
at location l

p(Ci f )
Matching
probability
Fei-Fei Li

on,x

Ci

p(on , x Ci , l)
Occurrence
distribution
Lecture 17 - 27

18-Nov-11

Derivation: ISM Recognition


Algorithm stages

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Vote weights: contribution of a single feature f


Probability that object on occurs at location x given (f,l)

p(on , x f , l) = p(Ci f )
i

Matching Occurrence
probability distribution

p(Ci f )
Matching
probability
Fei-Fei Li

p(on , x Ci , l)

p(on , x Ci , l)
Occurrence
distribution
Lecture 17 - 28

18-Nov-11

Derivation: ISM Recognition


Algorithm stages

Vote weights: contribution of a single feature f


Probability that object on occurs at location x given (f,l)

p(on , x f , l) = p(Ci f )

p(on , x Ci , l)

How to measure those probabilities?

f
1
p(Ci f ) =
, where C = {Ci | d (Ci , f ) }
|C |
1
Activated
p(on , x Ci , l) =
# occurrences (Ci )
codebook entries
Fei-Fei Li

Lecture 17 - 29

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Recognition


Algorithm stages

Vote weights: contribution of a single feature f


Probability that object on occurs at location x given (f,l)

p(on , x f , l) = p(Ci f )

p(on , x Ci , l)

Likelihood of the observed features given the object hypothesis

p ( on , x | f, l ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( f, l ) : Indicator variable for
sampled features

Fei-Fei Li

p ( o , x | C , l ) p ( C | f ) p ( f,l )
i

p ( on , x )

p ( on , x ) : Prior for the object location


Lecture 17 - 30

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Recognition


Algorithm stages

Vote weights: contribution of a single feature f


p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
p ( on , x | f, l ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )

Fei-Fei Li

p ( o , x | C , l ) p ( C | f ) p ( f,l )
i

p ( on , x )

Lecture 17 - 31

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Recognition


Algorithm stages

Vote weights: contribution of a single feature f


p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )

Fei-Fei Li

Lecture 17 - 32

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Recognition


Algorithm stages

Vote weights: contribution of a single feature f


p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )

Fei-Fei Li

Lecture 17 - 33

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Top-Down Segmentation


Algorithm stages

Vote weights: contribution of a single feature f


p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
Figure-ground backprojection
p(p = figure | on , x, f=, C
= p(p = fig. | on , x, Ci , l )
i , l )
p( f,l ) i

Fig./Gnd. label
for each occurrence

Fei-Fei Li

p(on , x | Ci , l ) p(Ci | f ) p( f,l )


p(on , x )
Influence on
object hypothesis

Lecture 17 - 34

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Top-Down Segmentation


Algorithm stages

Vote weights: contribution of a single feature f


p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
Figure-ground backprojection
p(p = figure | on , x, f=, l
)=
p(p = fig. | on , x, Ci , l )
p( f,l ) i i

Marginalize over
all codebook entries
matched to f

Fei-Fei Li

Fig./Gnd. label
for each occurrence

p(on , x | Ci , l ) p(Ci | f ) p( f,l )


p(on , x )
Influence on
object hypothesis

Lecture 17 - 35

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

Derivation: ISM Top-Down Segmentation


Algorithm stages

Vote weights: contribution of a single feature f


p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
Figure-ground backprojection
p(p = figure | on , x ) =
Marginalize over
all features containing pixel p

Fei-Fei Li

p(p = fig. | on , x, Ci , l )

pp
(( ff,,ll)) ii

Fig./Gnd. label
for each occurrence

p(on , x | Ci , l ) p(Ci | f ) p( f,l )


p(on , x )
Influence on
object hypothesis

Lecture 17 - 36

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

1. Voting
2. Mean-shift search
3. Backprojection

This may sound quite complicated, but it boils down to a


very simple algorithm
Fei-Fei Li

Lecture 17 - 37

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Top-Down Segmentation Algorithm

p(figure)

Original image

Segmentation
p(figure)
p(ground)

p(ground)

Interpretation of p(figure) map


per-pixel confidence in object hypothesis
Use for hypothesis verification
Fei-Fei Li

Lecture 17 - 38

18-Nov-11

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Segmentation

Example Results: Motorbikes

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Fei-Fei Li

Lecture 17 - 39

18-Nov-11

Example Results: Cows


Training
112 hand-segmented images

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Results on novel sequences:

Single-frame recognition - No temporal continuity used!


Fei-Fei Li

Lecture 17 - 40

18-Nov-11

Example Results: Chairs


Dining room chairs

Office chairs
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 41

18-Nov-11

Detections Using Ground Plane Constraints

left camera
1175 frames

Fei-Fei Li

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Battery of 5
ISM detectors
for different
car views

Lecture 17 - 42

18-Nov-11

[Thomas, Ferrari, Tuytelaars, Leibe, Van Gool, 3DRR07; RSS08]

Inferring Other Information: Part Labels (1)


Training

Test

Fei-Fei Li

Output

Lecture 17 - 43

18-Nov-11

[Thomas, Ferrari, Tuytelaars, Leibe, Van Gool, 3DRR07; RSS08]

Inferring Other Information: Part Labels (2)

Fei-Fei Li

Lecture 17 - 44

18-Nov-11

[Thomas, Ferrari, Tuytelaars, Leibe, Van Gool, 3DRR07; RSS08]

Inferring Other Information: Depth Maps

Depth from a single image

Fei-Fei Li

Lecture 17 - 45

18-Nov-11

Extension: Estimating Articulation


Try to fit silhouette to detected person

Basic idea
Search for the silhouette that simultaneously optimizes the
Chamfer match to the distance-transformed edge image
Overlap with the top-down segmentation

Enforces global consistency


Caveat: introduces again reliance on global model
[Leibe, Seemann, Schiele, CVPR05]

Fei-Fei Li

Lecture 17 - 46

18-Nov-11

Extension: Rotation-Invariant Detection


Polar instead of Cartesian voting scheme

dq

Benefits:
Recognize objects under image-plane rotations
Possibility to share parts between articulations.

Caveats:
Rotation invariance should only be used when its really needed.
(Also increases false positive detections)
[Mikolajczyk, Leibe, Schiele, CVPR06]

Fei-Fei Li

Lecture 17 - 47

18-Nov-11

Sometimes, Rotation Invariance Is Needed

[Mikolajczyk et al., CVPR06]

Fei-Fei Li

Lecture 17 - 48

18-Nov-11

You Can Try It At Home

s
x

Linux binaries available


Including datasets & several pre-trained detectors
http://www.vision.ee.ethz.ch/bleibe/code
Source: Bastian Leibe
Fei-Fei Li

Lecture 17 - 49

18-Nov-11

Discussion: Implicit Shape Model


Pros:
Works well for many different object categories
Both rigid and articulated objects

Flexible geometric model


Can recombine parts seen on different training examples

Learning from relatively few (50-100) training examples


Optimized for detection, good localization properties

Cons:
Needs supervised training data
Object bounding boxes for detection
Reference segmentations for top-down segm.

Only weak geometric constraints


Result segmentations may contain superfluous
body parts.

Purely representative model


No discriminative learning

Source: Bastian Leibe


Fei-Fei Li

Lecture 17 - 50

18-Nov-11

What we will learn today?


Implicit Shape Model
Representation
Recognition
Experiments and results

Deformable Models
The PASCAL challenge
Latent SVM Model

Fei-Fei Li

Lecture 17 - 51

18-Nov-11

Object Detection
the PASCAL Challenge
~10,000 images, with ~25,000 target objects.

Source: Pedro Felzenswalb

Objects from 20 categories (person, car, bicycle, cow,


table...).
Objects are annotated with labeled bounding boxes.

Fei-Fei Li

Lecture 17 - 52

18-Nov-11

Fei-Fei Li

Lecture 17 - 53

18-Nov-11

detection

Fei-Fei Li

root filter

part filters

Lecture 17 - 54

deformation
models

18-Nov-11

Source: Pedro Felzenswalb

Latent SVM Model: an Overview

Image is partitioned into 8x8 pixel blocks.


In each block we compute a histogram of gradient
orientations.
Invariant to changes in lighting, small deformations, etc.

We compute features at different resolutions (pyramid).


Fei-Fei Li

Lecture 17 - 55

18-Nov-11

Source: Pedro Felzenswalb

Histogram of Oriented Gradient (HOG) Features

Filters
Filters are rectangular templates defining weights for features.
Score is dot product of filter and subwindow of HOG pyramid.

Score of H at this location is H W

HOG pyramid
Fei-Fei Li

Lecture 17 - 56

Source: Pedro Felzenswalb

18-Nov-11

Object Hypothesis

Score is sum of filter scores


plus deformation scores

Multiscale model captures features at two-resolutions


Fei-Fei Li

Lecture 17 - 57

18-Nov-11

Training the Latent SVM Model


Training data consists of images with labeled bounding boxes.
Need to learn the model structure, filters and deformation costs.

Training

Source: Pedro Felzenswalb


Fei-Fei Li

Lecture 17 - 58

18-Nov-11

Connection with Linear Classifiers


Score of model is sum of filter scores plus
deformation scores
Bounding box in training data specifies that the score
should be high for some placement in a range
Standard
SVM

Weight vector

Fei-Fei Li

Latent
SVM

Features

w is a model
x is a detection window
z are filter placements

Concatenation of filters and


deformation parameters

Concatenation of features
and part displacements

Lecture 17 - 59

18-Nov-11

Latent SVMs
Linear in w if z is fixed

Regularization

Fei-Fei Li

Observed variables

Latent variables

Hinge Loss

Lecture 17 - 60

18-Nov-11

Latent SVM Training


Semi-convex optimization problem
Maximum of convex functions is convex

is convex in w
Convex!
if

= -1

Not convex
if

Fei-Fei Li

Lecture 17 - 61

=1

18-Nov-11

Latent SVM Training

Convex if we fix z for positive examples:


Affine!
if

=1

Iterative optimization procedure:


Initialize w and iterate:
Pick best z for each positive example
Optimize w via gradient descent with data mining
Fei-Fei Li

Lecture 17 - 62

18-Nov-11

Latent SVM Training: Initializing w


For k component mixture model:
Split examples into k sets based on bounding box aspect
ratio

Learn k root filters using standard SVM


Training data: Warped positive examples and random
windows from negative images (Dalal & Triggs)

Initialize parts by selecting patches from root filters:


Sub-windows with strong coefficients
Interpolate to get higher resolution filters
Initialize spatial model using fixed spring constants
Fei-Fei Li

Lecture 17 - 63

18-Nov-11

Learned Models

Fei-Fei Li

Lecture 17 - 64

18-Nov-11

Source: Pedro Felzenswalb

Example Results

Fei-Fei Li

Lecture 17 - 65

18-Nov-11

More Results

Fei-Fei Li

Lecture 17 - 66

18-Nov-11

Quantitative Results
9 systems competed in the 2007 challenge.
Out of 20 classes:
First place in 10 classes
Second place in 6 classes

Some statistics:
It takes ~2 seconds to evaluate a model in one
image.
It takes ~3 hours to train a model.
MUCH faster than most systems.
Source: Pedro Felzenswalb
Fei-Fei Li

Lecture 17 - 67

18-Nov-11

Code for Latent SVM


Source code for the system and models
trained on PASCAL 2006, 2007 and 2008
data are available at:
http://www.cs.uchicago.edu/~pff/latent

Source: Pedro Felzenswalb


Fei-Fei Li

Lecture 17 - 68

18-Nov-11

Summary
Deformable models provide an elegant framework
for object detection and recognition.
Efficient algorithms for matching models to images.
Applications: pose estimation, medical image analysis,
object recognition, etc.

We can learn models from partially labeled data.


Generalized standard ideas from machine learning.
Leads to state-of-the-art results in PASCAL challenge.

Future work: hierarchical models, grammars, 3D


objects.
Source: Pedro Felzenswalb
Fei-Fei Li

Lecture 17 - 69

18-Nov-11

What we have learned today


Implicit Shape Model
Representation
Recognition
Experiments and results

Deformable Models
The PASCAL challenge
Latent SVM Model

Fei-Fei Li

Lecture 17 - 70

18-Nov-11

You might also like