You are on page 1of 4

Scale Adaptive Visual Attention Detection by Subspace

Analysis

Yiqun Hu, Deepu Rajan and Liang-Tien Chia


School of Computer Engineering
Nanyang Technological University, Singapore, 639798
{y030070, asdrajan, asltchia}@ntu.edu.sg

ABSTRACT this problem, we propose that the features be endowed with


We describe a method to extract visual attention regions in the ability to automatically ascertain as to which scale it
images by robust subspace analysis from simple feature like should be observed in so as to be useful in VAR detection.
intensity endowed with scale adaptivity in order to represent Texture features such as Gabor features [5] reside in high
textured areas in an image. The scale adaptive descriptor is dimensional feature spaces and their extraction requires a
mapped onto clusters in linear spaces. A new subspace esti- priori information, such as directions for Gabor transforms
mation algorithm based on the Generalized Principal Com- and the distances in the case of co-occurrence matrices [1].
ponent Analysis (GPCA) is proposed to estimate multiple Moreover, the problem that we address in this paper is not
linear subspaces. The visual attention of each region is cal- that of texture segmentation, but the detection of salient
culated using a new region attention measure that considers region detection as determined by the human visual system.
feature contrast and spatial geometric properties. Compared We have proposed a framework [2] to detect attention re-
with existing visual attention detection methods, the pro- gion(s) by mapping color information of image patches onto
posed method directly measures global visual attention at a collection of linear subspaces so that each subspace cor-
the region level as opposed to pixel level. responds to region(s) with homogeneous color. The nearest
neighbor GPCA (NN-GPCA) was proposed to robustly ex-
tract those subspaces with cluster(s) corresponding to VARs.
Categories and Subject Descriptors To calculate the attention of a region, the cumulative pro-
I.4.8 [Image Processing and Computer Vision]: Scene jection was used as a region-based attention measure. In
Analysis this paper, we extend this framework to detect visual atten-
tion regions by using only intensity. The average intensity
General Terms of scale-adaptive patches are mapped onto linear subspaces
(sec 2). An iterative refinement scheme is applied on NN-
Algorithms GPCA to solve the estimation bias for detecting linear sub-
spaces with cluster(s) (sec 3). We also explicitly consider the
Keywords cluster factor in region attention measure to remove spurious
noise (sec 4). The experiments on both synthetic and real
visual attention, scale-selection, GPCA. data illustrate the promising result of the proposed method
(sec 5).
1. INTRODUCTION
Visual attention detection in images is useful in several 2. SCALE-SENTIENT REPRESENTATION
applications like surveillance and image retrieval. Use of
In [2], the simple polar transformation was proposed to
features that exhibit sufficient local contrast enable detec-
transfer image color information into a collection of linear
tion of visual attention regions (VAR) in images in which
subspaces so that each subspace corresponds to a region with
the VARs are easily identifiable. For instance, a tiger in
homogeneous color. Homogeneity of image regions is not
dense forest can be easily identified as a VAR using current
only in color but also in texture. Here, we introduce the
visual attention detection methods [3]. However, a more
scale-sentient mean feature to map texture regularity and
challenging scenario is one in which the VAR is not easily
proximity into linearity and clustering property of subspace.
distinguishable from the background, as in highly textured
images; for example, a zebra in the woods. In order to solve 2.1 Adaptive Scale Selection
The patches of regular texture can display homogeneity
if these patches are selected to correspond to a “texton”.
Permission to make digital or hard copies of all or part of this work for The scale of these patches could be different especially for
personal or classroom use is granted without fee provided that copies are near-regular texture. This requires that the feature extrac-
not made or distributed for profit or commercial advantage and that copies tion should be scale-sentient. Here, we use the automatic
bear this notice and the full citation on the first page. To copy otherwise, to scale selection method in [4] to locate the patches and de-
republish, to post on servers or to redistribute to lists, requires prior specific cide their characteristic scales.The Gaussian scale-space of
permission and/or a fee.
MM’07, September 23–28, 2007, Augsburg, Bavaria, Germany. an image is generated by convolving the image using a set
Copyright 2007 ACM 978-1-59593-701-8/07/0009 ...$5.00. of Gaussian kernels with increasing variances. The determi-

525
Gaussian Scale Space

Scale 1 (s =2) Scale 4 (s =8)

Figure 1: Illustration of local scale estimation on a synthetic image.

nant, kHnorm Lk of the normalized Hessian matrix is used to ing c from the linear system formed by the Veronese map
select the locations where kHnorm Lk is a local maximum in [6]
both spatial and scale domains. The characteristic scale of X nK
these patches is decided as the scale that achieves the local pn (x) = vn (x)T c = cn1 ...nK xn
1 ...xK = 0,
1
(1)
maximum in scale domain.
and then solve the normal vector b of every subspace from
Figure 1 illustrates the process of scale selection in a syn-
c. The NN-GPCA improves its robustness by considering
thetic image. The image consists of nine blobs of four dif-
the clustering constraint of the subspace. A weight matrix
ferent radii together with additive white Gaussian noise of
W calculated from k nearest neighbor analysis is utilized in
variance 0.1. We calculate kHnorm Lk at each location and .
the weighted least square estimation of c from W Ln c = 0.
for every scale (the scales of maximum are shown for two
Thus the effect of those outliers that do not form a cluster
interesting points). The local maxima of these values in the
in the linear subspace(s) is reduced.
scale-space volume are ranked and the top 9 values indicate
Although NN-GPCA improves the robustness of GPCA
the location of interesting points in the image. The scales
by reducing the effect of outliers, the problem of estimation
associated with those 9 values reflect the characteristic size
bias due to subspace skew still remains. Subspace skew is
of the blob. The interesting points are represented in 3-D
the phenomenon by which one of the subspaces is dominant
(x-y-scale) space and a ball is drawn at the location of the
(in terms of the number of data points in it) compared to
local maxima with a radius proportional to the scale.
the other subspaces. Since the objective function in the op-
2.2 Scale-adaptive Feature Extraction timization problem of both GPCA and NN-GPCA consists
of the sum of approximation errors, for a fixed number of
Returning to the extraction of scale sentient features, we
subspaces, the estimation will be biased towards those sub-
carry out the above scale-space analysis only on the intensity
spaces that are populated more. Fig 2 (a) - (b) show an
channels of an image. This results in the identification of
example of this problem in which the estimations are biased
interesting points and the associated scales. The feature
to the subspace with more data points.
used for further analysis in the VAR extraction algorithm is
then simply the mean of the intensity over a square window 3.2 Iterative Refinement of c
that is centered at the interesting point and whose size is
In both GPCA and NN-GPCA, multiple linear subspaces
determined by the scale. However, at those locations where
are transformed into a single linear model in the high dimen-
no interesting points were detected at any scale, the mean is
sional embedding space via the Veronese map. The least
computed over an 8 × 8 neighborhood. This implies that the
squares estimate in the high dimensional space only min-
windows can overlap. The mean as a feature is attractive
imizes the sum of squared error in the embedding space
because of its simplicity and the adaptive scale at which it
and does not necessarily minimize the error in the low-
is computed. The adaptivity ensures that texture regularity
dimensional space. Hence, we propose an iterative refine-
of image regions are captured in the feature. The polar
ment algorithm that adjusts the weight of each data point
transformation converts such descriptors into multiple linear
to solve the dominant bias problem of NN-GPCA.
subspaces.
We begin the iteration with the coefficient vector c ob-
tained from the weighted least squares approximation as in
3. UNBIASED SUBSPACE ESTIMATION NN-GPCA. The error of each data point xi is then calcu-
lated according to
3.1 Review of NN-GPCA
W (xi ) × |vn (xi )T c0 |
NN-GPCA is an extension of the original GPCA [6] which ei = (2)
is a geometric approach to estimate multiple linear sub- ||vn (xi )||
spaces with clusters without segmentation. GPCA considers where W (xi) is the weight of xi , c0 is the initial estimate of
the problem of estimating multiple linear subspaces as solv- c, | · | is the absolute value of the projection of vn (xi ) onto

526
700 700 700

600 600 600

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

0 0 0

−100 −100 −100


0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

(a) (b) (c)


700 700 700

600 600 600

500 500 500

400 400 400

300 300 300

200 200 200

100 100 100

0 0 0

−100 −100 −100


0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

(c) (d) (e)

Figure 2: Subspace skew on subspace estimation of GPCA and the improvement of the proposed method
(a) Synthetic data (b) initial estimate of GPCA and (c) final estimate of GPCA after optimization. (d)
estimation after 3rd iteration; (e) initial estimate after convergence and (e) final estimate after optimization.

c0 and || · || is the length of the vector vn (xi ). The weight of Cluster Growing
each data point is then updated as Due to the clustering constraint, points that do not lie in
W p+1 i
(x ) = W (x ) + ei p i
(3) any cluster are noise and should be removed before calculat-
ing the Cluster Compactness. We present a cluster growing
where p is the iteration number. The new value of c is
algorithm to remove these noise. Here we sort the points ac-
obtained from the weighted least squares solution of W · Ln ·
cording to their weights and pick the top m points to form
c = 0. During the initial steps of the iteration procedure, the
an initial set of inliers. In the case of multiple clusters within
weights of the data points in the smaller subspace change
a subspace, these m points will naturally come from each of
more than those in the dominant subspace. As iteration
the clusters, which implies that the initial set contains points
proceeds, the weights in the smaller subspace change less
drawn from all regions falling on that subspace. A normal-
as the error decreases. The cumulative effect of the change
ized average minimum distance d is defined as d = D/W,
in weights on all the subspaces is reflected in the sum and
where D is the sum of the minimum distance to its neigh-
variance of the errors. Hence the sum (se ) and variance (ve )
bor for all inliers and W is the sum of weights of all inliers.
of the errors serve as the stopping criterion, (rather than
Given a data point xi , we calculate its distance d(xi ) to the
the errors in individual data points) i.e., if |sp+1
e − spe | <
p+1 p ′ nearest inlier. Only when
ǫ ∧ |ve − ve | < ǫ , the iteration stops.
The proposed refinement process iteratively equalizes the d(xi )
≤ u × d, (4)
importance of the inliers in the embedding space by adjust- W (xi )
ing their weights according to the residuals in the current es-
timation. Notice that since the initial weights of the outliers the data point xi is included as an inlier. Here, u is a con-
calculated from the K nearest neighbors are small, they are stant that is required to extend the threshold beyond the
not increased significantly in the refinement process. Thus, average minimum distance; from experiments, it was ob-
only the inliers are considered in this process. It was found served that the choice of u was not critical. Normalization
through experiments that the iterative refinement of c con- by weight of each data point increases the discriminatory
verged in only about 5 to 10 iterations. By applying the power across different images, since a small (large) weight
proposed refinement process to the synthetic data in Fig 2 associated with noise (inlier) causes the left hand side of
(a), we can see from Fig 2 (d) - (e) that both subspaces can equation (4) to be even larger (smaller). When all data
be accurately estimated. points have been considered, the algorithm returns the set
of inliers that form clusters in each estimated subspace.

4. SUBSPACE-BASED ATTENTION Cluster Compactness


In [2], we propose to use Cumulative Projection (CP) to We now define the cluster compactness for a subspace with
measure the attention of every subspace. The CP for a sub- normal bj as
space calculates the sum of the projections of all inliers onto  
its normal. This attention measure considers not only the d cl

feature contrast but also spatial geometric properties (e.g. CC(bj ) =  , (5)
dcl
bj
size and location) of the corresponding region. However,
some spurious regions of noise data may achieve high CP where dcl
bj is the mean of the kNND of all inliers to the sub-
value because of their small sizes. In this section, we explic-
itly integrate clustering constraint with CP to reduce the space with normal bj and dcl is the mean of dcl
bj over all
noise effect through a measure called cluster compactness. subspaces. CC(bj ) measures the compactness of the clus-

527
ter(s) in one subspace compared to the other subspaces.
This measure is used in conjunction with the Cumulative
Projection to assign attention measures to regions in the
image as Attention = CC(bj ) × CP (bj ), where CP (bj ) is
the projection of data points onto the normal bj of the sub-
space, referred to as the cumulative projection, earlier.

5. EXPERIMENT EVALUATION
To evaluate the performance of the proposed method, we
conduct experiments on both synthetic data and real images.
In synthetic data, the proposed refinement process is shown
to solve the bias to the subspace skew of GPCA; in real
images, the proposed new framework achieve the promising
result for detecting attention regions, especially the chal-
lenging scenario of textured regions.

5.1 Synthetic Data Figure 4: Examples of VAR detection using the pro-
To evaluate the robustness to subspace skew, we gener- posed method. Top row: Original images, Middle
ate n = 2 subspaces with very different populations and add row: patches whose corresponding subspace ranked
Gaussian noise with variance of 10 to the sample points. We high, Bottom row: Detected VARs.
test the algorithm for different ratios of the number of data
points in one subspace to that in the other. Specifically,
we start from a ratio of 1:1 till 10:1. The estimation was 6. CONCLUSION
run 100 times P on each ratio and the error was calculated In this paper, we enhance the proposed NN-GPCA al-
as error = n1 n i=1 cos
−1 T
(bi b̃i ). Fig 3 shows that GPCA gorithm by iterative solution refinement for the problem of
and NN-GPCA achieve the same performance for small ra- scale adaptivity and bias to subspace skew. Using this en-
tios but as the ratio increases the NN-GPCA algorithm far hanced version of NN-GPCA, we detect the attention sub-
outperforms the GPCA algorithm. space corresponding to salient regions only from the polar
transformation of the intensity of scale-adaptive patches.
The clustering constraint is integrated into the subspace-
1.8 based attention measure. From the experiments, we show
GPCA estimation error that the proposed method can estimate skewed subspaces
NN−GPCA estimation error with high accuracy and that the proposed framework can
1.6
detect VARs in images without distinct visual contrast.

1.4 7. REFERENCES
[1] A. Baraldi and F. Parmiggiani. A investigation of the
1.2 textural characteristics associated with gray level
cooccurrence matrix statistical parameters. IEEE
Transaction on Geoscience and Remote Sensing,
1
33(2):293–304, Mar 1995.
[2] Y. Hu, D. Rajan, and L. T. Chia. Robust subspace
0.8 analysis for detecting visual attention regions in
1:2 1:4 1:6 1:8 1:10
images. In Proceedings of the 13th annual ACM
international conference on Multimedia, pages 716–724,
New York, NY, USA, 2005. ACM Press.
Figure 3: The estimation error v/s ratio of subspace
skew for NN-GPCA without and with the refinement [3] L. Itti, C. Koch, and E. Niebur. A model of
of c. saliency-based visual attention for rapid scene analysis.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 20(11):1254–1259, November 1998.
[4] T. Lindeberg. Feature detection with automatic scale
5.2 Real Images selection. International Journal of Computer Vision,
We also test the performance of the proposed framework 30(2):79–116, November 1998.
for detecting VARs on real images. Three examples on the [5] B. S. Manjunath and W. Y. Ma. Texture features for
real image are shown in Fig 4. The top row shows the orig- browsing and retrieval of image data. IEEE
inal images. The patches whose corresponding polar rep- Transaction on Pattern Analysis and Machine
resentations belong to the detected most salient attention Intelligence, 18(8):837–842, June 1996.
subspace are marked as bright pixels in the middle row and [6] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal
the corresponding bounding boxes are shown in the bottom Component Analysis (GPCA). IEEE Transaction on
row. From the experiment results, it is shown that the VARs Pattern Analysis and Machine Intelligence,
have been successfully detected. 27(12):1945–1959, November 2005.

528

You might also like