Lecture17 Object Detection Cs231a

Lecture 17:
object detection
Professor Fei-Fei Li
Stanford Vision Lab
Fei-Fei Li
Lecture 17 - 1
18-Nov-11
Object detection
Fei-Fei Li
Lecture 17 - 2
18-Nov-11
What we will learn today?

Implicit Shape Model
Representation
Recognition
Experiments and results
Deformable Models
The PASCAL challenge
Latent SVM Model
Fei-Fei Li
Lecture 17 - 3
18-Nov-11

Representation
Recognition
Deformable Models
Latent SVM Model
Fei-Fei Li
Lecture 17 - 4
18-Nov-11
Implicit Shape Model (ISM)

Basic ideas
x1
Learn an appearance codebook

Learn a star-topology structural model
x6
x2
x5
x3
x4
Features are considered independent given obj. center
Algorithm: probabilistic Gen. Hough Transform
Exact correspondences
NN matching
Feature location on obj.
Uniform votes
Quantized Hough array
Prob. match to object part

Soft matching
Part location distribution
Probabilistic vote weighting
Continuous Hough space
Source: Bastian Leibe

Fei-Fei Li
Lecture 17 - 5
18-Nov-11
Implicit Shape Model: Basic Idea

Visual vocabulary is used to index votes for object
position [a visual word = part].
Visual codeword with

displacement vectors
Training image
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
Fei-Fei Li
Lecture 17 - 6
18-Nov-11
Implicit Shape Model: Basic Idea

Objects are detected as consistent configurations of
the observed parts (visual words).
Test image
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
Fei-Fei Li
Lecture 17 - 7
18-Nov-11
Implicit Shape Model - Representation
Training images
(+reference segmentation)
Learn appearance codebook
Appearance codebook
y
Extract local features at interest points

Agglomerative clustering codebook
s
x
Learn spatial distributions

Match codebook to training images
Record matching positions on object
s
x

Fei-Fei Li
Spatial occurrence distributions

+ local figure-ground labels
Lecture 17 - 8
18-Nov-11
Implicit Shape Model - Recognition

Interest Points
Matched Codebook
Entries
Probabilistic
Voting
y
[Leibe, Leonardis, Schiele, SLCV04; IJCV08]
Image Feature
Interpretation
(Codebook match)
Object
Position
s
o,x
Ci
p(Ci f )
3D Voting Space
(continuous)
p(on , x Ci , l)
Probabilistic vote weighting

(will be explained later in detail)
Fei-Fei Li
Lecture 17 - 9
18-Nov-11
Implicit Shape Model - Recognition

Interest Points
Matched Codebook
Entries
Probabilistic
Voting
3D Voting Space
(continuous)
Backprojected
Hypotheses
Fei-Fei Li
Backprojection
of Maxima
Lecture 17 - 10
18-Nov-11
Example: Results on Cows
Original image
Fei-Fei Li
Lecture 17 - 11
18-Nov-11
Interest
Originalpoints
image
Fei-Fei Li
Lecture 17 - 12
18-Nov-11
Interest
Originalpoints
image
Matched
patches
Fei-Fei Li
Lecture 17 - 13
18-Nov-11
Prob. Votes
Fei-Fei Li
Lecture 17 - 14
18-Nov-11
Source: K. Grauman & B. Leibe
1st hypothesis
Fei-Fei Li
Lecture 17 - 15
18-Nov-11
2nd hypothesis
Fei-Fei Li
Lecture 17 - 16
18-Nov-11
3rd hypothesis
Fei-Fei Li
Lecture 17 - 17
18-Nov-11
Scale Invariant Voting

Scale-invariant feature selection
Scale-invariant interest points
Rescale extracted patches
Match to constant-size codebook
Generate scale votes

Scale as 3rd dimension in voting space
Search for maxima in 3D voting space
Search
window
y
x
Fei-Fei Li
Lecture 17 - 18
18-Nov-11
Scale Voting: Efficient Computation

s
Scale votes
Binned
accum. array
Candidate
maxima
Refinement
(Mean-Shift)
Continuous Generalized Hough Transform

Binned accumulator array similar to standard Gen. Hough Transf.
Quickly identify candidate maxima locations
Refine locations by Mean-Shift search only around those points
Avoid quantization effects by keeping exact vote locations.
Mean-shift interpretation as kernel prob. density estimation.
Fei-Fei Li
Lecture 17 - 19
18-Nov-11
Scale Voting: Efficient Computation

s
Scale votes
Binned
accum. array
Candidate
maxima
Refinement
(Mean-Shift)
Scale-adaptive Mean-Shift search for refinement

Increase search window size with hypothesis scale
Scale-adaptive balloon density estimator
This image cannot currently be display ed.

Fei-Fei Li
Lecture 17 - 20
18-Nov-11
Detection Results
Qualitative Performance
Recognizes different kinds of objects
Robust to clutter, occlusion, noise, low contrast
Lecture 17 -
21
Figure-Ground Segregation
What happens first segmentation or recognition?
Problem extensively studied in
Psychophysics
Experiments with ambiguous
figure-ground stimuli
Results:
Evidence that object recognition can
and does operate before figure-ground
organization
Interpreted as Gestalt cue familiarity.
M.A. Peterson, Object Recognition Processes Can and Do Operate Before FigureGround Organization, Cur. Dir. in Psych. Sc., 3:105-111, 1994.
Fei-Fei Li
Lecture 17 - 22
18-Nov-11
ISM Top-Down Segmentation

Interest Points
Matched Codebook
Entries
Probabilistic
Voting
Segmentation
p(figure)
Probabilities
3D Voting Space
(continuous)
Backprojected
Hypotheses
Backprojection
of Maxima
Fei-Fei Li
Lecture 17 - 23
18-Nov-11
Top-Down Segmentation: Motivation
Secondary hypotheses (mixtures of cars/cows/etc.)

Desired property of algorithm! robustness to occlusion
Standard solution: reject based on bounding box overlap
Problematic - may lead to missing detections!
Use segmentations to resolve ambiguities instead.
Basic idea: each observed pixel can only be explained by
(at most) one detection.
Fei-Fei Li
Lecture 17 - 24
18-Nov-11
Top-Down Segmentation: Motivation
Secondary hypotheses (mixtures of cars/cows/etc.)

Desired property of algorithm! robustness to occlusion
Standard solution: reject based on bounding box overlap
Problematic - may lead to missing detections!
Use segmentations to resolve ambiguities instead.
Basic idea: each observed pixel can only be explained by
(at most) one detection.
Fei-Fei Li
Lecture 17 - 25
18-Nov-11
Segmentation: Probabilistic Formulation
Influence of patch on object hypothesis (vote weight)

p(o , x | C ) p(C
p( f , l o , x ) =
i
p(on , x )
| f ) p( f,l )
Backprojection to features f and pixels p:

p(p = figure | on , x ) =
p(p = figure | f , l, o , x ) p( f , l | o , x )
n
p( f ,l )
Segmentation
information
Influence on
object hypothesis
Fei-Fei Li
Lecture 17 - 26
18-Nov-11
Derivation: ISM Recognition

Algorithm stages
1. Voting
2. Mean-shift search
3. Backprojection
Vote weights: contribution of a single feature f

Image Feature f Codebook matches Object location
at location l
p(Ci f )
Matching
probability
Fei-Fei Li
on,x
Ci
p(on , x Ci , l)
Occurrence
distribution
Lecture 17 - 27
18-Nov-11

Algorithm stages
1. Voting
3. Backprojection

Probability that object on occurs at location x given (f,l)
p(on , x f , l) = p(Ci f )
i
Matching Occurrence
probability distribution
p(Ci f )
Matching
probability
Fei-Fei Li
p(on , x Ci , l)
p(on , x Ci , l)
Occurrence
distribution
Lecture 17 - 28
18-Nov-11

Algorithm stages

p(on , x f , l) = p(Ci f )
p(on , x Ci , l)
How to measure those probabilities?
f
1
p(Ci f ) =
, where C = {Ci | d (Ci , f ) }
|C |
1
Activated
p(on , x Ci , l) =
# occurrences (Ci )
codebook entries
Fei-Fei Li
Lecture 17 - 29
18-Nov-11
1. Voting
3. Backprojection

Algorithm stages

p(on , x f , l) = p(Ci f )
p(on , x Ci , l)
Likelihood of the observed features given the object hypothesis
p ( on , x | f, l ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( f, l ) : Indicator variable for
sampled features
Fei-Fei Li
p ( o , x | C , l ) p ( C | f ) p ( f,l )
i
p ( on , x )
p ( on , x ) : Prior for the object location

Lecture 17 - 30
18-Nov-11
1. Voting
3. Backprojection

Algorithm stages

p ( on , x | f, l ) p ( f, l ) i p ( on , x | Ci , l ) p ( Ci | f ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
p ( on , x | f, l ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
Fei-Fei Li
p ( o , x | C , l ) p ( C | f ) p ( f,l )
i
p ( on , x )
Lecture 17 - 31
18-Nov-11
1. Voting
3. Backprojection

Algorithm stages

p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
Fei-Fei Li
Lecture 17 - 32
18-Nov-11
1. Voting
3. Backprojection

Algorithm stages

p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
Fei-Fei Li
Lecture 17 - 33
18-Nov-11
1. Voting
3. Backprojection
Derivation: ISM Top-Down Segmentation

Algorithm stages

p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
Figure-ground backprojection
p(p = figure | on , x, f=, C
= p(p = fig. | on , x, Ci , l )
i , l )
p( f,l ) i
Fig./Gnd. label
for each occurrence
Fei-Fei Li
p(on , x | Ci , l ) p(Ci | f ) p( f,l )

p(on , x )
Influence on
object hypothesis
Lecture 17 - 34
18-Nov-11
1. Voting
3. Backprojection

Algorithm stages

p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
p(p = figure | on , x, f=, l
)=
p(p = fig. | on , x, Ci , l )
p( f,l ) i i
Marginalize over
all codebook entries
matched to f
Fei-Fei Li
Fig./Gnd. label
for each occurrence
p(on , x | Ci , l ) p(Ci | f ) p( f,l )

p(on , x )
Influence on
object hypothesis
Lecture 17 - 35
18-Nov-11
1. Voting
3. Backprojection

Algorithm stages

p ( f, l | on , x ) =
=
p ( on , x )
p ( on , x )
p(p = figure | on , x ) =
Marginalize over
all features containing pixel p
Fei-Fei Li
p(p = fig. | on , x, Ci , l )
pp
(( ff,,ll)) ii
Fig./Gnd. label
for each occurrence
p(on , x | Ci , l ) p(Ci | f ) p( f,l )

p(on , x )
Influence on
object hypothesis
Lecture 17 - 36
18-Nov-11
1. Voting
3. Backprojection
This may sound quite complicated, but it boils down to a

very simple algorithm
Fei-Fei Li
Lecture 17 - 37
18-Nov-11
Top-Down Segmentation Algorithm
p(figure)
Original image
Segmentation
p(figure)
p(ground)
p(ground)
Interpretation of p(figure) map

per-pixel confidence in object hypothesis
Use for hypothesis verification
Fei-Fei Li
Lecture 17 - 38
18-Nov-11
Segmentation
Example Results: Motorbikes
Fei-Fei Li
Lecture 17 - 39
18-Nov-11
Example Results: Cows

Training
112 hand-segmented images
Results on novel sequences:
Single-frame recognition - No temporal continuity used!

Fei-Fei Li
Lecture 17 - 40
18-Nov-11
Example Results: Chairs

Dining room chairs
Office chairs
Fei-Fei Li
Lecture 17 - 41
18-Nov-11
Detections Using Ground Plane Constraints
left camera
1175 frames
Fei-Fei Li
Battery of 5
ISM detectors
for different
car views
Lecture 17 - 42
18-Nov-11
[Thomas, Ferrari, Tuytelaars, Leibe, Van Gool, 3DRR07; RSS08]
Inferring Other Information: Part Labels (1)

Training
Test
Fei-Fei Li
Output
Lecture 17 - 43
18-Nov-11
Inferring Other Information: Part Labels (2)
Fei-Fei Li
Lecture 17 - 44
18-Nov-11
Inferring Other Information: Depth Maps
Depth from a single image
Fei-Fei Li
Lecture 17 - 45
18-Nov-11
Extension: Estimating Articulation

Try to fit silhouette to detected person
Basic idea
Search for the silhouette that simultaneously optimizes the
Chamfer match to the distance-transformed edge image
Overlap with the top-down segmentation
Enforces global consistency

Caveat: introduces again reliance on global model
[Leibe, Seemann, Schiele, CVPR05]
Fei-Fei Li
Lecture 17 - 46
18-Nov-11
Extension: Rotation-Invariant Detection

Polar instead of Cartesian voting scheme
dq
Benefits:
Recognize objects under image-plane rotations
Possibility to share parts between articulations.
Caveats:
Rotation invariance should only be used when its really needed.
(Also increases false positive detections)
[Mikolajczyk, Leibe, Schiele, CVPR06]
Fei-Fei Li
Lecture 17 - 47
18-Nov-11
Sometimes, Rotation Invariance Is Needed
[Mikolajczyk et al., CVPR06]
Fei-Fei Li
Lecture 17 - 48
18-Nov-11
You Can Try It At Home
s
x
Linux binaries available

Including datasets & several pre-trained detectors
http://www.vision.ee.ethz.ch/bleibe/code
Fei-Fei Li
Lecture 17 - 49
18-Nov-11
Discussion: Implicit Shape Model

Pros:
Works well for many different object categories
Both rigid and articulated objects
Flexible geometric model

Can recombine parts seen on different training examples
Learning from relatively few (50-100) training examples

Optimized for detection, good localization properties
Cons:
Needs supervised training data
Object bounding boxes for detection
Reference segmentations for top-down segm.
Only weak geometric constraints

Result segmentations may contain superfluous
body parts.
Purely representative model

No discriminative learning

Fei-Fei Li
Lecture 17 - 50
18-Nov-11

Representation
Recognition
Deformable Models
Latent SVM Model
Fei-Fei Li
Lecture 17 - 51
18-Nov-11
Object Detection
the PASCAL Challenge
~10,000 images, with ~25,000 target objects.
Source: Pedro Felzenswalb
Objects from 20 categories (person, car, bicycle, cow,

table...).
Objects are annotated with labeled bounding boxes.
Fei-Fei Li
Lecture 17 - 52
18-Nov-11
Fei-Fei Li
Lecture 17 - 53
18-Nov-11
detection
Fei-Fei Li
root filter
part filters
Lecture 17 - 54
deformation
models
18-Nov-11
Latent SVM Model: an Overview
Image is partitioned into 8x8 pixel blocks.

In each block we compute a histogram of gradient
orientations.
Invariant to changes in lighting, small deformations, etc.
We compute features at different resolutions (pyramid).

Fei-Fei Li
Lecture 17 - 55
18-Nov-11
Histogram of Oriented Gradient (HOG) Features
Filters
Filters are rectangular templates defining weights for features.
Score is dot product of filter and subwindow of HOG pyramid.
Score of H at this location is H W
HOG pyramid
Fei-Fei Li
Lecture 17 - 56
18-Nov-11
Object Hypothesis
Score is sum of filter scores

plus deformation scores
Multiscale model captures features at two-resolutions

Fei-Fei Li
Lecture 17 - 57
18-Nov-11
Training the Latent SVM Model

Training data consists of images with labeled bounding boxes.
Need to learn the model structure, filters and deformation costs.
Training

Fei-Fei Li
Lecture 17 - 58
18-Nov-11
Connection with Linear Classifiers

Score of model is sum of filter scores plus
deformation scores
Bounding box in training data specifies that the score
should be high for some placement in a range
Standard
SVM
Weight vector
Fei-Fei Li
Latent
SVM
Features
w is a model
x is a detection window
z are filter placements
Concatenation of filters and

deformation parameters
Concatenation of features
and part displacements
Lecture 17 - 59
18-Nov-11
Latent SVMs
Linear in w if z is fixed
Regularization
Fei-Fei Li
Observed variables
Latent variables
Hinge Loss
Lecture 17 - 60
18-Nov-11
Latent SVM Training

Semi-convex optimization problem
Maximum of convex functions is convex
is convex in w
Convex!
if
= -1
Not convex
if
Fei-Fei Li
Lecture 17 - 61
=1
18-Nov-11
Latent SVM Training
Convex if we fix z for positive examples:

Affine!
if
=1
Iterative optimization procedure:

Initialize w and iterate:
Pick best z for each positive example
Optimize w via gradient descent with data mining
Fei-Fei Li
Lecture 17 - 62
18-Nov-11
Latent SVM Training: Initializing w

For k component mixture model:
Split examples into k sets based on bounding box aspect
ratio
Learn k root filters using standard SVM

Training data: Warped positive examples and random
windows from negative images (Dalal & Triggs)
Initialize parts by selecting patches from root filters:

Sub-windows with strong coefficients
Interpolate to get higher resolution filters
Initialize spatial model using fixed spring constants
Fei-Fei Li
Lecture 17 - 63
18-Nov-11
Learned Models
Fei-Fei Li
Lecture 17 - 64
18-Nov-11
Example Results
Fei-Fei Li
Lecture 17 - 65
18-Nov-11
More Results
Fei-Fei Li
Lecture 17 - 66
18-Nov-11
Quantitative Results
9 systems competed in the 2007 challenge.
Out of 20 classes:
First place in 10 classes
Second place in 6 classes
Some statistics:
It takes ~2 seconds to evaluate a model in one
image.
It takes ~3 hours to train a model.
MUCH faster than most systems.
Fei-Fei Li
Lecture 17 - 67
18-Nov-11
Code for Latent SVM

Source code for the system and models
trained on PASCAL 2006, 2007 and 2008
data are available at:
http://www.cs.uchicago.edu/~pff/latent

Fei-Fei Li
Lecture 17 - 68
18-Nov-11
Summary
Deformable models provide an elegant framework
for object detection and recognition.
Efficient algorithms for matching models to images.
Applications: pose estimation, medical image analysis,
object recognition, etc.
We can learn models from partially labeled data.

Generalized standard ideas from machine learning.
Leads to state-of-the-art results in PASCAL challenge.
Future work: hierarchical models, grammars, 3D

objects.
Fei-Fei Li
Lecture 17 - 69
18-Nov-11
What we have learned today

Representation
Recognition
Deformable Models
Latent SVM Model
Fei-Fei Li
Lecture 17 - 70
18-Nov-11

Lecture17 Object Detection Cs231a

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture17 Object Detection Cs231a

Uploaded by

Copyright:

Available Formats

Lecture 17:

What we will learn today?

What we will learn today?

Implicit Shape Model (ISM)

Learn an appearance codebook

Features are considered independent given obj. center

Algorithm: probabilistic Gen. Hough Transform

Prob. match to object part

Source: Bastian Leibe

Implicit Shape Model: Basic Idea

Visual codeword with

Implicit Shape Model: Basic Idea

Implicit Shape Model - Representation

Learn appearance codebook

Extract local features at interest points

Learn spatial distributions

Source: Bastian Leibe

Spatial occurrence distributions

Implicit Shape Model - Recognition

Probabilistic vote weighting

Implicit Shape Model - Recognition

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Example: Results on Cows

Source: Bastian Leibe

Example: Results on Cows

Source: Bastian Leibe

Example: Results on Cows

Source: Bastian Leibe

Example: Results on Cows

Source: Bastian Leibe

Source: K. Grauman & B. Leibe

Example: Results on Cows

Example: Results on Cows

Source: Bastian Leibe

Example: Results on Cows

Source: Bastian Leibe

Scale Invariant Voting

Generate scale votes

Search for maxima in 3D voting space

Scale Voting: Efficient Computation

Continuous Generalized Hough Transform

Scale Voting: Efficient Computation

Scale-adaptive Mean-Shift search for refinement

Source: Bastian Leibe

Source: Bastian Leibe

ISM Top-Down Segmentation

Top-Down Segmentation: Motivation

Secondary hypotheses (mixtures of cars/cows/etc.)

Top-Down Segmentation: Motivation

Secondary hypotheses (mixtures of cars/cows/etc.)

Segmentation: Probabilistic Formulation

Influence of patch on object hypothesis (vote weight)

Backprojection to features f and pixels p:

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Derivation: ISM Recognition

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Vote weights: contribution of a single feature f

Derivation: ISM Recognition

[Leibe, Leonardis, Schiele, SLCV04; IJCV08]

Vote weights: contribution of a single feature f

Derivation: ISM Recognition

Vote weights: contribution of a single feature f