Professional Documents
Culture Documents
November 2, 2017
Outline
What is Object Detection and Segmentation?
Examples
A Thriving Ecosystem
The Atlas
Public Datasets
What is Object Detection?
1
Wikipedia 3/47
What is Image Segmentation?
2
Wikipedia 4/47
Generic Detection and Segmentation
3
Stanford cs231n (2017) 5/47
Examples: Binary Mask
6/47
Examples: Binary Mask
7/47
Examples: Binary Mask
8/47
Examples: Multiclass
9/47
Examples (and More Lingo)
10/47
Boundary Segmentation Examples
11/47
Boundary Segmentation Examples
12/47
Instance Segmentation Examples
13/47
Instance Segmentation Examples
14/47
Image Segmentation Examples
15/47
Image Segmentation Examples
16/47
History: Pre Deep Learning
4
ImageNet: A Large-Scale Hierarchical Image Database
5
The PASCAL Visual Object Classes (VOC) Challenge
6
The PASCAL Challenge: A Retrospective 17/47
Pre Deep Learning Methods
18/47
Common Issues with Algorithms
I Accuracy problems
Huge number of candidate regions inflates false-positive rates
Illumination, occlusion, etc. can confuse test and label process
7
terabyte
8
That is, large dataset likely has many scale variations of same object 19/47
Quality Assessment and Metrics
I Assessing the quality of a classification result is generally well
defined
24/47
Quality Assessment and Metrics
I A common metric is mean average precision (mAP)9
For each class ci , calculate average precision api = AP(ci )
Compute the mean over all api values calculated for each class
26/47
Quality Assessment and Metrics
27/47
Quality Assessment and Metrics
10
Again, see Everingham et al. for additional discussion 29/47
PASCAL VOC2012 Leaderboard
30/47
Early DL Detection and Segmentation
11
LeCun et al., Gradient-Based Learning Applied to Doc Recognition, 1998 31/47
Early DL Solutions: The R-CNN Family
32/47
Early DL Solutions: The R-CNN Family
Figure 21: Uijlings et al., Selective Search for Object Recognition, 2013
34/47
Early DL Solutions: The R-CNN Family
12
He et al., Spatial Pyramid Pooling in Deep Convolutional Networks, 2014 35/47
The R-CNN Family: Fast R-CNN (2015)
I Fast R-CNN is a combines stages 2 and 3 of R-CNN and is
trained with multi-task loss (log loss and smooth L1 loss)
I A Fast R-CNN network takes as input an image and a set of
object proposals
I The network first processes the whole image with conv and
pooling layers to produce a conv feature map.
I For each object proposal an region of interest ROI pooling
layer extracts associated features from the conv map.
I Each feature vector is fed into a fully connected (fc) layer that
finally branches into two sibling output layers:
A softmax layer producing classification labels
A linear layer producing bounding box positions
Can we do better . . . ?
38/47
The R-CNN Family: Faster R-CNN
I Fast R-CNN is still using independent region proposal stage.
I As you might guess, the Faster R-CNN revision will introduce
a Region Proposal Network (RPN) that shares full-image
convolutional features with the detection network.
I The RPN simultaneously predicts object bounds and
objectness scores at each position.
I The RPN is trained end-to-end to generate high-quality region
proposals which are used by Fast R-CNN
I The RPN and Fast R-CNN are merged into a single network
by sharing their convolutional feature. Using “attention”
mechanisms, the RPN component tells unified network where
to look.
I Multi-task training with 4 loss functions (obj/not obj, ROI
bbox, classify, final obj bbox)
39/47
The R-CNN Family: Faster R-CNN
40/47
The R-CNN Family: Faster R-CNN
41/47
The R-CNN Family: Faster R-CNN
42/47
The R-CNN Family: Mask R-CNN
43/47
The R-CNN Family: Mask R-CNN
Results on COCO test images using ResNet-101-FPN at 5 fps.
44/47
Wrapping up: A Thriving Ecosystem
45/47
Detection and Segmentation Atlas
There are too many outstanding contributions to cover in a single deck. Here is a brief
high-level overview intended to provide broader context and help guide additional
exploration.
LeNet (1998)
Learning Deconv Nets (2015) VGG (2014) Overfeat (2013) FCNs (2014) GoogleNetV 1 (2014) Deep MultiBox (2013)
Recombinator Nets (2015) SegNet (2015) Frac Max-Pooling (2014) DeepMask (2015) R-CNN (2013) YOLO (2015) GoogleNetV 2,V 3 (2015) Multi-scale MultiBox (2014)
U-Networks (2015) Dropout Bayes (2015) Kroneker Layer (2015) SharpMask (2016) Fast R-CNN (2015) ResNet (2015) Single-Shot MultiBox (2015)
Bayes SegNet (2015) Dilated Conv (2015) MultiPath (2016) Faster R-CNN (2015) GoogleNetV 4 (2016) Speed/Acc Trade-offs (2016)
DeepLab (2016) FPN (2016) R-FCN (2016) ResNeXt (2016) Spatially Adaptive Nets (2016)
CRFs (2012) Rethinking Atrous (2017) Mask R-CNN (2017) Beyond Skip Conns (2016)
46/47
Public Datasets
Inria Aerial Image Labeling . . . . . . . . . . . . 2017 paper data
DAVIS Challenge . . . . . . . . . . . . . . . . . . . . . . 2017 paper data
Mapillary Vistas . . . . . . . . . . . . . . . . . . . . . . . 2017 paper data
ADE20K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016 paper data
SYNTHIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016 paper data
SpaceNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2016 N/A data
Playing for Data . . . . . . . . . . . . . . . . . . . . . . 2016 paper data
SUN RGB-D . . . . . . . . . . . . . . . . . . . . . . . . . . 2015 paper data
Cityscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015 paper data
Common Objects in Context (COCO) 2014 paper data
Oxford RoboCar Dataset . . . . . . . . . . . . . . 2014 paper data
KITTI Vision Benchmark Suite . . . . . . . . . 2012 paper data
Visual Object Classes 2012 (VOC12). . . . 2012 [1][2] data
NYU Depth Dataset V2 . . . . . . . . . . . . . . . 2012 paper data
Caltech Pedestrian Detection Benchmark 2009 [1][2] data
CamVid: Motion-based Segmentation . . . 2008 paper data
47/47