Professional Documents
Culture Documents
1, JANUARY 2007
I NTRODUCTION
1 pE py,xqq
e
Z
(1)
u pyi , xi q
p pyi , yj , xi q
i j Ni
epE py,xqq
y Y
y Y
yj w
(2)
y argmin E py, xq
Ui
y Y
Some efficient optimization methods for discrete labels are proved to exist in some domains
of problem, such as binary segmentation and
multiclass segmentation. Not only for inference, the optimization method is also used for
learning the CRF parameters.
S ALIENCY F ILTERS
Saliency or salience is a state of being standing out and very ease to see. According to
Longman dictionary, an object is salient when
it appears as the most important or noticeable
object among other objects. Saliency of an object
is interpreted as a visual existence and a state
of being the central object. Salient objects tend
to have a more compact appearance compared
to the background objects. For example, Figure
2a shows a cat as the main object in the image
and dominates the visual perception over all
objects, whereas Figure 2b shows a cat and a
baby where the cat is not a salient object.
}ci cj }2 .wijp
j 1
p
wij
1
1
expp 2 vpi pj wq
Zi
2c
(3)
j 1
c
wij
1
1
expp 2 vci cj wq
Zi
2c
(4)
Ui.exppk.Diq
(5)
DATASETS
(b)
regions from the background. The binary segmentation employs a CRF and the saliency
filters. From Equation 5, Si can be regarded as a
saliency map that rates the saliency level from
every pixels. Figure 5 shows the results of the
saliency filters. The saliency map is shown by
the image in the middle. It can be utilized to
segment the salient part of an image through a
binary segmentation where the unary potential
is represented by Si [6]. The energy formulation
of the CRF is written as follows.
y argmin E py, lf , xq
yY
E py, lf , xq
saliency pyi , lf , xq
saliency pyi , lf , xq
query: aeroplane
Google
Images
Saliency
Detector
Labeled
Images
vyi yj w
i j Nj
p1 saliencypi, xqq
saliency pi, xq
if yi lf
otherwise
E XPERIMENT R ESULTS
experiment mainly consists of two steps, training and testing. In training phase, a CRF model
is learned from the given datasets. In testing
phase, the CRF model is utilized to predict the
unknown samples. There are three experiment
scenarios. Each scenarios follows the steps as
described before, yet differs in datasets composition when training.
The first scenario aims to compare the performance between two cases, i.e. an experiment
using the combination of VOC PASCAL 2010
and Google Images, and an experiment using
VOC PASCAL 2010 datasets solely.
The second scenario aims to compare the performance between two cases, i.e. an experiment
using the VOC PASCAL 2010 dataset and an
experiment using the Google Images only.
The third scenario aims to compare the performance among several cases, where each
cases uses a certain amount of Google Images
only. In this paper, the scenario uses 600, 700,
800, and 900 Google Images respectively. The
performance is measured with averaged class
accuracy (abbreviated as CA) and global accuracy (abbreviated as GA).
Table 1 summarizes the result of the first sce-
Averaged CA (%)
11.0450
11.7042
GA
79.213
79.299
Classes
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
background
Averaged CA
(CRF+VOC)
15.1694
0.0000
3.1685
5.3229
0.7876
10.5571
14.3507
12.5929
4.0795
1.6487
3.9595
5.1569
4.7258
14.9855
24.1621
1.1087
11.7376
1.5923
15.5669
3.4973
77.7755
11.0450
(CRF+VOC+Google Images)
12.3211
0.9129
3.5391
10.4128
3.5560
14.9576
16.6133
10.6088
2.0624
6.1896
4.0120
2.2727
4.7002
16.2211
22.7428
0.7029
12.8854
1.2566
15.4514
6.9930
77.3758
11.7042
Averaged CA (%)
11.045
9.256
No
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(CRF+VOC)
15.169
0.000
3.169
5.323
0.788
10.557
14.351
12.593
4.080
1.649
3.960
5.157
4.726
14.986
24.162
1.109
11.738
1.592
15.567
3.497
77.776
11.045
(CRF+Google Images)
8.012
0.058
2.138
8.056
0.000
17.212
4.112
9.461
0.554
6.106
0.654
1.753
3.891
7.364
9.664
2.207
13.959
0.721
17.521
5.352
78.314
9.256
GA
79.213
73.359
Classes
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
background
Averaged CA
Classes
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
background
Averaged CA
Global Accuracy
600
8.012
0.058
2.138
8.056
0.000
7617.212
334.112
889.461
0.554
6.106
0.654
1.753
3.891
7.364
9.664
2.207
13.959
0.721
17.521
5.352
75.578
9.256
73.359
700
7.610
0.465
0.743
10.260
0.041
19.210
4.845
7.204
0.260
7.961
0.912
3.890
4.651
4.887
9.677
2.039
11.578
0.922
17.446
3.548
75.459
9.219
73.503
800
11.095
0.880
0.566
7.982
0.000
19.067
6.014
6.695
0.085
5.635
1.153
2.775
4.198
6.762
7.442
3.603
10.771
1.242
13.478
5.188
75.690
9.063
73.770
900
9.477
1.431
0.959
7.810
1.266
18.783
6.013
8.359
0.379
8.121
0.623
2.566
5.464
5.064
7.479
1.691
9.975
0.371
12.826
4.485
75.170
8.967
73.046
This research proposes Google Images as training dataset. The Google Images is converted
into strong labeled dataset by saliency filtering.
The perfomance improvement varies in some
scenarios. Combining the datasets from both
of VOC PASCAL 2010 and the Google Images
increases the prediction accuracy. The Google
(a)
(b)
(c)
(d)
Fig. 7: The examples of results from scenario 2. a The original images b The ground truth labeled
images c The result from the first experiment (CRF+VOC) d The result from the second experiment
(CRF+VOC+Google Images)
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
F. Perazzi, P. Krahenbuhl,
Y. Pritch, and A. Hornung,
Saliency filters: Contrast based filtering for salient region
detection, in CVPR, 2012, pp. 733740.
R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and
S. Ssstrunk, Slic superpixels, EPFL, Tech. Rep. 149300,
June 2010.
P. Kraehenbuehl, Efficient inference in fully connected
crfs with gaussian edge potentials, 2014. [Online]. Available: http://graphics.stanford.edu/projects/densecrf/
and V. Koltun, Efficient inference in fully
P. Krahenbuhl
connected crfs with gaussian edge potentials, in Advances
in Neural Information Processing Systems 24, J. ShaweTaylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger,
Eds.
Curran Associates, Inc., 2011, pp. 109117.
[Online]. Available: http://papers.nips.cc/paper/4296efficient-inference-in-fully-connected-crfs-with-gaussianedge-potentials.pdf