Professional Documents
Culture Documents
object detection
Professor Fei-Fei Li
Stanford Vision Lab
Fei-Fei Li
Lecture 17 - 1
18-Nov-11
Object detection
Fei-Fei Li
Lecture 17 - 2
18-Nov-11
Deformable Models
The PASCAL challenge
Latent SVM Model
Fei-Fei Li
Lecture 17 - 3
18-Nov-11
Deformable Models
The PASCAL challenge
Latent SVM Model
Fei-Fei Li
Lecture 17 - 4
18-Nov-11
x1
x6
x2
x5
x3
x4
Exact correspondences
NN matching
Feature location on obj.
Uniform votes
Quantized Hough array
Lecture 17 - 5
18-Nov-11
Lecture 17 - 6
18-Nov-11
Test image
B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and
Segmentation, International Journal of Computer Vision, Vol. 77(1-3), 2008.
Source: Bastian Leibe
Fei-Fei Li
Lecture 17 - 7
18-Nov-11
Training images
(+reference segmentation)
Appearance codebook
y
s
x
s
x
Lecture 17 - 8
18-Nov-11
Matched Codebook
Entries
Probabilistic
Voting
y
[Leibe, Leonardis, Schiele, SLCV04; IJCV08]
Image Feature
Interpretation
(Codebook match)
Object
Position
s
o,x
Ci
p(Ci f )
3D Voting Space
(continuous)
p(on , x Ci , l)
Lecture 17 - 9
18-Nov-11
Matched Codebook
Entries
Probabilistic
Voting
3D Voting Space
(continuous)
Backprojected
Hypotheses
Fei-Fei Li
Backprojection
of Maxima
Lecture 17 - 10
18-Nov-11
Original image
Fei-Fei Li
Lecture 17 - 11
18-Nov-11
Interest
Originalpoints
image
Fei-Fei Li
Lecture 17 - 12
18-Nov-11
Interest
Originalpoints
image
Matched
patches
Fei-Fei Li
Lecture 17 - 13
18-Nov-11
Prob. Votes
Fei-Fei Li
Lecture 17 - 14
18-Nov-11
1st hypothesis
Fei-Fei Li
Lecture 17 - 15
18-Nov-11
2nd hypothesis
Fei-Fei Li
Lecture 17 - 16
18-Nov-11
3rd hypothesis
Fei-Fei Li
Lecture 17 - 17
18-Nov-11
Search
window
y
x
Source: Bastian Leibe
Fei-Fei Li
Lecture 17 - 18
18-Nov-11
Scale votes
Binned
accum. array
Candidate
maxima
Refinement
(Mean-Shift)
Lecture 17 - 19
18-Nov-11
Scale votes
Binned
accum. array
Candidate
maxima
Refinement
(Mean-Shift)
Lecture 17 - 20
18-Nov-11
Detection Results
Qualitative Performance
Recognizes different kinds of objects
Robust to clutter, occlusion, noise, low contrast
Lecture 17 -
21
Figure-Ground Segregation
What happens first segmentation or recognition?
Problem extensively studied in
Psychophysics
Experiments with ambiguous
figure-ground stimuli
Results:
Evidence that object recognition can
and does operate before figure-ground
organization
Interpreted as Gestalt cue familiarity.
M.A. Peterson, Object Recognition Processes Can and Do Operate Before FigureGround Organization, Cur. Dir. in Psych. Sc., 3:105-111, 1994.
Fei-Fei Li
Lecture 17 - 22
18-Nov-11
Matched Codebook
Entries
Probabilistic
Voting
Segmentation
p(figure)
Probabilities
3D Voting Space
(continuous)
Backprojected
Hypotheses
Backprojection
of Maxima
[Leibe, Leonardis, Schiele, SLCV04; IJCV08]
Fei-Fei Li
Lecture 17 - 23
18-Nov-11
Lecture 17 - 24
18-Nov-11
Lecture 17 - 25
18-Nov-11
p( f , l o , x ) =
i
p(on , x )
| f ) p( f,l )
p(p = figure | f , l, o , x ) p( f , l | o , x )
n
p( f ,l )
Segmentation
information
Influence on
object hypothesis
Fei-Fei Li
Lecture 17 - 26
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
p(Ci f )
Matching
probability
Fei-Fei Li
on,x
Ci
p(on , x Ci , l)
Occurrence
distribution
Lecture 17 - 27
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
p(on , x f , l) = p(Ci f )
i
Matching Occurrence
probability distribution
p(Ci f )
Matching
probability
Fei-Fei Li
p(on , x Ci , l)
p(on , x Ci , l)
Occurrence
distribution
Lecture 17 - 28
18-Nov-11
p(on , x f , l) = p(Ci f )
p(on , x Ci , l)
f
1
p(Ci f ) =
, where C = {Ci | d (Ci , f ) }
|C |
1
Activated
p(on , x Ci , l) =
# occurrences (Ci )
codebook entries
Fei-Fei Li
Lecture 17 - 29
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
p(on , x f , l) = p(Ci f )
p(on , x Ci , l)
p ( on , x | f, l ) p ( f, l )
p ( f, l | on , x ) =
=
p ( on , x )
p ( f, l ) : Indicator variable for
sampled features
Fei-Fei Li
p ( o , x | C , l ) p ( C | f ) p ( f,l )
i
p ( on , x )
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Fei-Fei Li
p ( o , x | C , l ) p ( C | f ) p ( f,l )
i
p ( on , x )
Lecture 17 - 31
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Fei-Fei Li
Lecture 17 - 32
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Fei-Fei Li
Lecture 17 - 33
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Fig./Gnd. label
for each occurrence
Fei-Fei Li
Lecture 17 - 34
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Marginalize over
all codebook entries
matched to f
Fei-Fei Li
Fig./Gnd. label
for each occurrence
Lecture 17 - 35
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Fei-Fei Li
p(p = fig. | on , x, Ci , l )
pp
(( ff,,ll)) ii
Fig./Gnd. label
for each occurrence
Lecture 17 - 36
18-Nov-11
1. Voting
2. Mean-shift search
3. Backprojection
Lecture 17 - 37
18-Nov-11
p(figure)
Original image
Segmentation
p(figure)
p(ground)
p(ground)
Lecture 17 - 38
18-Nov-11
Segmentation
Fei-Fei Li
Lecture 17 - 39
18-Nov-11
Lecture 17 - 40
18-Nov-11
Office chairs
Source: Bastian Leibe
Fei-Fei Li
Lecture 17 - 41
18-Nov-11
left camera
1175 frames
Fei-Fei Li
Battery of 5
ISM detectors
for different
car views
Lecture 17 - 42
18-Nov-11
Test
Fei-Fei Li
Output
Lecture 17 - 43
18-Nov-11
Fei-Fei Li
Lecture 17 - 44
18-Nov-11
Fei-Fei Li
Lecture 17 - 45
18-Nov-11
Basic idea
Search for the silhouette that simultaneously optimizes the
Chamfer match to the distance-transformed edge image
Overlap with the top-down segmentation
Fei-Fei Li
Lecture 17 - 46
18-Nov-11
dq
Benefits:
Recognize objects under image-plane rotations
Possibility to share parts between articulations.
Caveats:
Rotation invariance should only be used when its really needed.
(Also increases false positive detections)
[Mikolajczyk, Leibe, Schiele, CVPR06]
Fei-Fei Li
Lecture 17 - 47
18-Nov-11
Fei-Fei Li
Lecture 17 - 48
18-Nov-11
s
x
Lecture 17 - 49
18-Nov-11
Cons:
Needs supervised training data
Object bounding boxes for detection
Reference segmentations for top-down segm.
Lecture 17 - 50
18-Nov-11
Deformable Models
The PASCAL challenge
Latent SVM Model
Fei-Fei Li
Lecture 17 - 51
18-Nov-11
Object Detection
the PASCAL Challenge
~10,000 images, with ~25,000 target objects.
Fei-Fei Li
Lecture 17 - 52
18-Nov-11
Fei-Fei Li
Lecture 17 - 53
18-Nov-11
detection
Fei-Fei Li
root filter
part filters
Lecture 17 - 54
deformation
models
18-Nov-11
Lecture 17 - 55
18-Nov-11
Filters
Filters are rectangular templates defining weights for features.
Score is dot product of filter and subwindow of HOG pyramid.
HOG pyramid
Fei-Fei Li
Lecture 17 - 56
18-Nov-11
Object Hypothesis
Lecture 17 - 57
18-Nov-11
Training
Lecture 17 - 58
18-Nov-11
Weight vector
Fei-Fei Li
Latent
SVM
Features
w is a model
x is a detection window
z are filter placements
Concatenation of features
and part displacements
Lecture 17 - 59
18-Nov-11
Latent SVMs
Linear in w if z is fixed
Regularization
Fei-Fei Li
Observed variables
Latent variables
Hinge Loss
Lecture 17 - 60
18-Nov-11
is convex in w
Convex!
if
= -1
Not convex
if
Fei-Fei Li
Lecture 17 - 61
=1
18-Nov-11
=1
Lecture 17 - 62
18-Nov-11
Lecture 17 - 63
18-Nov-11
Learned Models
Fei-Fei Li
Lecture 17 - 64
18-Nov-11
Example Results
Fei-Fei Li
Lecture 17 - 65
18-Nov-11
More Results
Fei-Fei Li
Lecture 17 - 66
18-Nov-11
Quantitative Results
9 systems competed in the 2007 challenge.
Out of 20 classes:
First place in 10 classes
Second place in 6 classes
Some statistics:
It takes ~2 seconds to evaluate a model in one
image.
It takes ~3 hours to train a model.
MUCH faster than most systems.
Source: Pedro Felzenswalb
Fei-Fei Li
Lecture 17 - 67
18-Nov-11
Lecture 17 - 68
18-Nov-11
Summary
Deformable models provide an elegant framework
for object detection and recognition.
Efficient algorithms for matching models to images.
Applications: pose estimation, medical image analysis,
object recognition, etc.
Lecture 17 - 69
18-Nov-11
Deformable Models
The PASCAL challenge
Latent SVM Model
Fei-Fei Li
Lecture 17 - 70
18-Nov-11