You are on page 1of 14

2784 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO.

6, JUNE 2015

Band Selection Using Improved Sparse Subspace


Clustering for Hyperspectral Imagery Classification
Weiwei Sun, Liangpei Zhang, Senior Member, IEEE, Bo Du, Senior Member, IEEE, Weiyue Li,
and Yenming Mark Lai

AbstractAn improved sparse subspace clustering (ISSC) Index TermsBand selection, classification, hyperspectral
method is proposed to select an appropriate band subset for hyper- imagery (HSI), improved sparse subspace clustering (ISSC).
spectral imagery (HSI) classification. The ISSC assumes that band
vectors are sampled from a union of low-dimensional orthogonal
subspaces and each band can be sparsely represented as a linear
I. I NTRODUCTION
or affine combination of other bands within its subspace. First, the
ISSC represents band vectors with sparse coefficient vectors by
solving the L2-norm optimization problem using the least square
regression (LSR) algorithm. The sparse and block diagonal struc-
O WNING to advantages in collecting tens to hundreds
of continuous bands of spectrum responses from visible
to near-infrared wavelength, hyperspectral imagery (HSI) has
ture of the coefficient matrix from LSR leads to correct segmenta-
tion of band vectors. Second, the angular similarity measurement powerful performance in recognizing different ground objects
is presented and utilized to construct the similarity matrix. Third, with subtle spectrum divergences through the classification
the distribution compactness (DC) plot algorithm is used to esti- implementation [1]. The classification results of HSI dataset are
mate an appropriate size of the band subset. Finally, spectral now widely used in many realistic applications, such as ocean
clustering is implemented to segment the similarity matrix and the
desired ISSC band subset is found. Four groups of experiments on monitoring [2], land cover mapping [3], [4], precision farming
three widely used HSI datasets are performed to test the perfor- [5], [6], and mine exploitation [7], [8]. Unfortunately, numer-
mance of ISSC for selecting bands in classification. In addition, the ous bands as well as strong intra-band correlations also bring
following six state-of-the-art band selection methods are used to about big problems to the classification implementation [9],
make comparisons: linear constrained minimum variance-based [10]. Especially, the curse of dimensionality renders that the
band correlation constraint (LCMV-BCC), affinity propagation
(AP), spectral information divergence (SID), maximum-variance HSI dataset requires extremely more training samples if accu-
principal component analysis (MVPCA), sparse representation- rate classification result is wanted, whereas collecting so many
based band selection (SpaBS), and sparse nonnegative matrix training samples is expensive and time-consuming [11], [12].
factorization (SNMF). Experimental results show that the ISSC Therefore, making dimensionality reduction is an alternative
has the second shortest computational time and also outperforms way to conquer these problems.
the other six methods in classification accuracy when using an
appropriate band number obtained by the DC plot algorithm. Dimensionality reduction of HSI datasets can typically be
divided into two categories: band selection (i.e., feature selec-
Manuscript received October 17, 2014; revised January 25, 2015; accepted tion) and feature extraction [13]. Band selection selects an
March 13, 2015. Date of publication April 13, 2015; date of current version
appropriate band subset from the original band set of the HSI
July 30, 2015. This work was supported in part by the National Natural Science
Foundation under Grant 41401389, Grant 41431175, and Grant 61471274, dataset while feature extraction preserves important spectral
in part by the Research Project of Zhejiang Educational Committee under features through mathematical transformations. In this paper,
Grant Y201430436, in part by Ningbo Natural Science Foundation under we focus on the band selection category of dimensionality
Grant 2014A610173, in part by the Discipline Construction Project of Ningbo
University under Grant ZX2014000400, Normal project of Shanghai Normal
reduction, since the selected band combination could perfectly
University under Grant SK201525, in part by the Key Laboratory of Mining solve the problem of curse of dimensionality and inherit the
Spatial Information Technology of NASMG under Grant KLM201309, and in original spectral meanings from the original HSI dataset.
part by the K. C. Wong Magna Fund in Ningbo University. Previous work in the dimensionality reduction of band selec-
W. Sun is with the State Key Laboratory for Information Engineering in
Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University,
tion can also be roughly divided into two classes: 1) the maxi-
Wuhan 430039, China, and also with the College of Architectural Engineering, mum information or minimum correlation (MIMC) scheme and
Civil Engineering and Environment, Ningbo University, Ningbo 315211, China 2) the maximum inter-class separability (MIS) scheme. MIMC
(e-mail: nbsww@outlook.com). selects an appropriate band subset in which each single-band
L. Zhang is with the State Key Laboratory for Information Engineering in
Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University, image has the maximum information or minimum correlation
Wuhan 430039, China (e-mail: zlp62@whu.edu.cn). with other bands. The MIMC scheme typically uses the three
B. Du is with the School of Computer, Wuhan University, Wuhan 430039, main criteria of entropy criterion, the intra-band correlation cri-
China (e-mail: gunspace@163.com). terion, and the cluster criterion. The entropy criterion algorithm
W. Li is with the Institute of Urban Studies, Shanghai Normal University,
Shanghai 200234, China (e-mail: lwy_326@126.com). collects an appropriate band subset by maximizing the overall
Y. M. Lai is with the Institute for Computational Engineering and Sciences amount of information using entropy-like measurements [14],
(ICES), University of Texas at Austin, Austin, TX 78712 USA (e-mail: [15]. The intra-band correlation criterion algorithms select the
yenming.mark.lai@gmail.com).
appropriate band subset having minimum intra-band correla-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. tions. Examples include the mutual information-based algo-
Digital Object Identifier 10.1109/JSTARS.2015.2417156 rithm [16] and the constrained band selection algorithm based

1939-1404 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2785

on constrained energy minimization (CBS-CEM) [17]. The sparsely represented as a linear or affine combination of
cluster criterion algorithm also considers intra-band corrections other bands [31]. The two above assumptions differ from the
and selects a representative band from each band cluster using assumptions of current SB algorithms and clustering algo-
certain clustering algorithms. Examples include the hierarchical rithms. Second, our proposed ISSC method improves the SSC
clustering algorithm using the mutual information measure- method with three following modifications: The ISSC uses the
ment [18] and the affinity propagation (AP) algorithm with l2 -norm to avoid the too sparse coefficient vector solution
noise-removal bands using wavelet shrinkage [19]. when using the l1 -norm and to ensure the coefficient matrix
In contrast, the MIS scheme selects an appropriate band is sparse and block diagonal. Our proposed angular similar-
subset that maximizes the separability of different ground ity measurement replaces the l1 -directed graph construction
objects in the image scene. The MIS scheme is typically imple- (DGC) measurement in SSC and better represents the total
mented using one of the following algorithms: the distance similarity between two sparse coefficient band vectors than
measurement criterion algorithm, the feature transformation the isolated coefficient values. The distribution compactness
criterion algorithm, and the realistic application criterion algo- (DC) plot algorithm intelligently estimates the size of band
rithm. The distance measurement criterion algorithm maxi- subset and eliminates errors from artificial estimation in SSC.
mizes the inter-class differences using a distance-like measure- The above improvements in SSC technique by the proposed
ment such as the spectral information divergence (SID), the method ensure good performance in selecting an appropriate
transformed divergence (TD), or Mahalanobis distance [20]. band subset.
The feature transformation criterion algorithm selects an appro- This paper is organized as follows. Section II reviews the
priate band subset by analyzing the inter-class separability classical sparse subspace clustering. Section III presents the
of ground objects in a low-dimensional feature space found band selection method using the proposed ISSC. Section IV
through feature transformations. Examples include linear pre- analyzes the performance of ISSC in band selection for clas-
diction algorithm [21] and the complex network algorithm sification on three widely used HSI datasets. Section V states
[22]. The realistic application criterion algorithm selects an conclusion and outlines our future work.
appropriate band subset through maximizing or minimizing the
defined objective function suitable for realistic applications of
HSI dataset, and the typical examples are the band selection II. A R EVIEW OF S PARSE S UBSPACE C LUSTERING
algorithm using high-order movements [23] and the super-
In this section, we review the classical SSC method. We
vised band selection algorithm using the known class spectral
choose to use a noise-free dataset rather than noisy dataset
signatures [24].
to illustrate principles more clearly. SSC proposed by Ehsan
In recent years, the study of sparsity in HSI datasets has
Elhamifar regards that each data point lies in a union of sub-
attracted much interest in the remote sensing community. The
spaces corresponding to several classes or clusters to which
sparsity theory states that each band vector or spectrum vec-
the dataset belongs; therefore, each point can be sparsely rep-
tor can be sparsely represented using only a few nonzero
resented by other points of a union of subspaces. The SSC
coefficients in a suitable basis or dictionary [25]. Sparse rep-
then uses sparse representations of data points to cluster the
resentations of a band vector can then reveal certain under-
points into separate subspaces [31]. SSC has been used in a
lying structures such as the clustering structure within the
wide variety of applications such as motion segmentation and
HSI dataset and also drastically reduce the computational bur-
face clustering [32]. It typically consists of three stages: 1) find-
den in processing HSI datasets [26]. Accordingly, researchers
ing sparse representations of each data point through convex
began to study the band selection problem using sparsity-
optimization; 2) learning a similarity matrix (i.e., a weight
based (SB) schemes and have proposed algorithms such as
matrix); and 3) clustering the similarity matrix using spectral
the sparse representation-based band selection (SpaBS) algo-
clustering [33].
rithm [27], the sparse nonnegative matrix factorization (SNMF)
Assume a high-dimensional dataset without noises Y =
algorithm [28], the collaborative sparse model (CSM) algo- N
{yi }i=1 RDN are actually lying in a union of linear sub-
rithm [29], and the sparse support vector machine (SSVM) k k
algorithm [30]. spaces {Cl }l=1 with dimensions {dl }l=1 , where D is the
In this paper, we address the band selection problem using dimension of high-dimensional space and k is the number of
an idea inspired by sparse space clustering (SSC). We present subspaces or clusters. Specifically, for a band dataset in hyper-
a band selection method using an improved version of SSC spectral field, all band vectors constitute the high-dimensional
which we call improved sparse space clustering (ISSC). In dataset, the number of bands N corresponds to the size of data
particular, our motivation is to ameliorate the SSC techniques points, and the number of pixels D determines the dimension-
by ISSC and implement the ISSC into HSI dataset in order ality of high-dimensional space. We assume each point yi
to solve the band selection problem. Our contributions are RD1 lies in exactly one of the k linear spaces Cl . Hence, each
as follows: First, we are the first to explore band selection linear subspace Cl contains a cluster of unique Nl data points
from the SSC perspective. The ISSC method assumes that and the number of points within subspace Cl , Nl is greater than
the subspace dimension size dl , i.e., for each subspace Cl , we
all bands (i.e., band vectors) of the HSI dataset are drawn  Nl
from a union of low-dimensional orthogonal subspaces rather have the cluster of data points Yl = yjl j=1 RDNl with
than a single uniform subspace, and that each band can be Nl > dl . This placement of N points into the k subspaces also
2786 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015


k minimized using convex programming algorithms such as the
implies Nl = N . Accordingly, each data point yi RD1 interior point method [35], the basis pursuit algorithm [36], and
l=1
belonging to Cl can be represented as the alternating directions algorithm [37].
Sparse matrix Z is then utilized to construct the similarity
yi = y1l 1l + y2l 2l + yjl jl + yN
l
l
l Nl
matrix W for inferring the segmentation of all data points into
l different subspaces. The matrix W can be regarded as an undi-
1
2l rected weighted graph, where each entry Wij represents the
weight of the edges between pairwise points yi and yj . The SSC
..
 l l 
= y1 y2 yjl yN l . method utilizes the l1 -directed graph construction (DGC) mea-
l l
j surement to guarantee the symmetrization of weights between
. pairwise points yi and yj . The DGC measurement is performed
..
l as follows:
N l
|Zij | + |Zji |
= Yl l , jl = 0 if yi = yjl (1) Wij = (5)
2
where l RNl 1 and Y l are the coefficient vector and dic- After
where |Zij | is the absolute value of the entry Zij in Z.
tionary of yi , respectively. The constraint jl = 0 if yi = yjl constructing the similarity matrix W, the SSC uses spectral
in (1) is to avoid the trivial solution of reconstructing point yi clustering [33] to cluster all data points into their underlying
as a linear combination
 k of itself. If we combine all the cluster subspaces.
dictionaries Yl l=1 , the point yi can be represented using

yi = Y 1 1 + Y 2 2 + + Y k k
1 III. BAND S ELECTION OF HSI DATASET U SING ISSC

 1 2  2 In this section, the band selection method using ISSC

= Y Y Yk . = Ai (2) is described. Section III-A presents sparse representa-
..
tions of each band vector through solving the l2 -norm.
k
Section III-B describes the construction of the similarity
  matrix using our proposed angular similarity measurement.
where = Y1 Y2 Yk RDN is the arrangement of
Section III-C explains the appropriate estimation of band clus-
k cluster dictionaries and Ai RN 1 is the coefficient vector
ters using the DC plot algorithm and shows how to select
from all data points. Assume that Y = , T = T = I,
an appropriate band subset using spectral clustering. Finally,
where is the permutation matrix of all cluster dictionaries,
Section III-D summarizes our band selection method using
when combining all N data points into a matrix format by
ISSC.
placing points in individual columns, (2) can be transformed
into

Y = YZ, diag (Z) = 0 (3) A. Sparse Representations of Band Vectors Using L2-Norm
N
where Z = T [A1 A2 AN ] RN N is the coefficient Consider a collection of HSI band vectors Y = {yi }i=1
DN
matrix of all data points and diag(Z) is the vector of diago- R that are drawn from a union of orthogonal subspaces
k
nal entries in Z. The constraint diag(Z) = 0 is to eliminate the {Cl }l=1 , where D is the dimension of high-dimensional space
trivial solution that each data points is simply a linear combi- and is equal to the number of pixels in the image scene, N is the
nation of itself. The ideal solution of (3) finds a set data points number of bands with N << D, and k is the number of band
from a single subspace where the number of nonzero entries clusters (i.e., the number of underlying subspaces). Each band
of yi coincides with the dimensionality of this subspace. The vector yi is contaminated with Gaussian noise and is sparsely
ideal solution guarantees that the coefficient matrix Z is sparse represented by other band vectors as follows:
or block diagonal and could benefit the correct segmentation of
all data points into separate subspaces. yi = YZi + e, Zii = 0 (6)
The solution Z of (3) can be regarded as the optimization
where Zi = [Zi,1 Zi,2 Zi,N ]T is the coefficient vector of
problem of minimizing the following objection function: band vector yi , and e is the error term with a bounded norm. The
= arg minZ , error in (6) results from noises in band vectors and the approx-
Z q subject to Y = YZ and diag(Z) = 0
imation errors in sparse representation by other band vectors.
(4) We combine all the band vectors by placing them column by
column to achieve the following matrix format:
where Zq represents the lq -norm of Z defined as Zq  =
 1
N N q q Y = YZ + E, diag(Z) = 0 (7)
i=1 j=1 |Z ij | . The l0 -norm minimization counts the
number of nonzero entries in Z and can be solved using the where Z = [Z1 Z2 ZN ] and E = [e1 e2 eN ] are the
smoothed L0 algorithm [34]. The l1 -norm can be efficiently sparse coefficient matrix and error matrix of all band vectors,
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2787

respectively. Like (4), (7) can be solved by optimizing the where [Z] i = (YT Y + I)1 YT yi is the ith column
i i i
following problem: of desired optimal solution Z and i = y T yi +
i
= arg minZ ,
Z subject to Y YZ yiT Yi (YiT Yi + I)1 YiT yi . We then substitute the
q  1
equation T (YT Y + I)1 = T (YT Y + I) into
and diag(Z) = 0 (8)
(11) resulting from the property of matrix , and the
where is the norm bound of the error. When choosing q = 1, optimal solution is achieved as Z = (YT Y + I)1
T 1 1 = 0.
the optimization problem (9) can be transformed into the well- (diag((Y Y + I) )) , where diag(Z)
known lasso problem [38]
= arg min Y YZ2 + Z , subject to diag(Z) = 0
Z B. Angular Similarity Measurement Between Sparse
2 1
(9) Coefficient Band Vectors
where Z1 denotes the l1 -norm of the matrix Z, Z1 = After solving problem (10), the ISSC constructs the similar-
N  N ity matrix W using the sparse coefficient matrix Z of all band
i=1 j=1 |Zij | and > 0 is a scalar regularization param-
eter that balances the weight of the error terms and the sparsity vectors. The similarity matrix can be regarded as an undirected
of the coefficient matrix. The assumption of underlying sub- weighted graph G = (V, W), where Vij V represents the
spaces in all band vectors guarantees that the l1 -norm achieves edge between pairwise bands yi and yj , and the weight Wij
a sparse and block diagonal coefficient matrix for further seg- W measures the similarity between sparse representations of
mentation. However, the extremely high intra-band correlations pairwise band vectors yi and yj . The DGC similarity mea-
may cause the sparse representations of band vectors to select surement in SSC assumes that the nonzero entries of sparse
coefficient vector Z i reflect the closeness or similarity with
only one band at random [39], which would bring about the
too sparse solution of problem (9) by segmenting the within- other bands from the same subspace, since the ideal similar-
cluster bands into different subspaces. Therefore, we relax (7) ity matrix has only edges between pairwise band vectors from
by optimizing the following l2 - norm problem: the same subspace [33]. Some authors have proposed relevant
measurements to improve the performance of the DGC similar-
= arg min YYZ2 + Z2 ,
Z subject to diag(Z) = 0 ity measurement, such as the sparsity induced similarity [43]
2 2
(10) and the nonnegative sparsity induced similarity [44]. The above
measurements utilize individual sparse coefficients to compute
where Z2 represents the l2 -norm (the Frobenius norm) of the weights of two bands. However, the local similarity repre-
 1
N N 2 2 sented by sparse coefficients cannot describe the true similarity
matrix Z, Z2 = i=1 j=1 Zij . It has been proved that
between two bands because sparse coefficients from real-world
both the orthogonal subspace assumption of band vectors and
HSI dataset are sensitive to noise and outliers [45].
the l2 -norm of matrix Z can guarantee the optimal solution in
We present an angular similarity (AS) measurement using
problem (10) is both sparse and block diagonal even if the band
the angular information between two sparse coefficient band
numbers N is smaller than the dimension of high-dimensional
vectors. The AS measurement assumes that sparse represen-
space D (i.e., insufficient data sampling with N << D) [40].
tations of two similar bands from the same subspace should
Moreover, the optimal solution in problem (10) is proven to
have a small angle between the two of them since both of them
have a grouping effect with intra-band correlation dependent
are sparsely represented in a similar combination using other
band vectors, and this grouping effect can segment highly cor-
bands. For each pairwise sparse coefficient vectors Zi and Zj
related band vectors [41]. Therefore, the optimal solution of
that represent coefficient vectors of yi and yj , respectively, the
problem (10) can minimize the intra-cluster affinities of band
AS similarity measurement is defined as follows:
vectors and capture the correlation structure of band vectors
from the same orthogonal subspace for correct clustering. 2
In this paper, we utilize the least squares regression (LSR)
Zi Zj
WASij =  2  2 (12)
[41] algorithm to solve the optimization problem (10). Let    
Yi = Y\yi = [y1 y2 yi1 yi+1 yN ] be the remaining Z i  Z j 
column set of Y after removing the column yi , and let Ei = 2
(YiT Yi + I)1 and Y = [Yi yi ], where is the per- where denotes the inner product of two band vectors and 
mutation  matrix with T = T = I. The LSR factorizes is the norm of the vector. The use of the squaring operation
T T 1 in (12) is to guarantee a positive value for each WASij and to
matrix (Y Y + I) to achieve the optimal solution
using the Woodbury formula [42]
Z, increase the separability of sparse band vectors from different
 T 1 subspaces. By computing all combinations of columns indexes
 T T 1 Yi Yi + I YiT yi and row indexes using (12), the similarity matrix W is found
(Y Y + I) =
yiT Yi yiT yi + for further clustering.
 T 
(Yi Yi + I)1 0
=
0 0 C. Band Clustering With an Appropriate Cluster Number
 T

[Z] [Z]
i [Z] i
+ i i (11) When constructing the similarity matrix, the spectral clus-
T
[Z] 1
i tering algorithm is utilized to segment the weighted graph into
2788 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

k clusters. However, a difficult problem is how to estimate an


appropriate number k of band clusters, because the k signifi-
cantly affects the clustering result of band vectors. In general,
the cluster number k is arbitrarily determined with estima-
tion. However, a too small k would divide highly correlated
bands into other different subspaces and render an error-prone
band subset. In contrast, a too large k would bring about too
much computational burden for further classification of HSI
dataset. To address these problems, a variety of methods have
been proposed to help estimate an appropriate k and can be
classified into two main classes, the posterior method and the
prior method. The posterior method tests all possible num-
ber of clusters and then selects an appropriate cluster number
using defined criteria such as the over information-theoretic
criterion [46], the gap statistic criterion [47], and the mini-
mum description length criterion [48]. The posterior method is
computationally expensive, and all candidate numbers of clus-
ters have to be tested explicitly. In contrast, the prior method
estimates the cluster number before implementing spectral clus-
tering. Prior methods include the Eigen-gap heuristic algorithm
[49] and the edge based-algorithm [50].
Since the similarity matrix W is ideally sparse and block
diagonal, we introduce a DC plot algorithm [51] to estimate the Fig. 1. Band selection using ISSC.
appropriate number k of band clusters. The algorithm uses a
nonparametric estimate of the probability density function of
the band dataset and determines the appropriate number k of corresponding band vectors are segmented into their underly-
band clusters through analyzing the DC of sparse representa- ing subspaces. 4) The band whose corresponding row vectors
tions of band vectors in the kernel space constructed by the in Uk is closest to the mean vector (i.e., the vector of the cen-
k
AS measurement. The DC measurement of all clusters uses an troid) of its cluster {Cl }l=1 in terms of Euclidean distance is
eigenvalue decomposition of the similarity matrix as follows: chosen as an element of the band subset for HSI dataset and
hence the appropriate band subset from ISSC is achieved.
 N
  2
DC = p(Z) 1T W1N =
dZ i 1TN ui (13)
N

Z i=1
D. Summary of Band Selection Using ISSC
In the above three sections, we provided three improvements
where 1N is a N 1 dimensional vector that contains all ones, of the classical SSC when implementing band selection on a
W is the similarity matrix, and i and ui are the ith eigenvalues HSI dataset. 1) We set the lq -norm problem in (5) to the l2 -norm
and eigenvectors of W, respectively. The appropriate
 cluster problem to achieve sparse representations of each band vector
2  using the LSR algorithm. The l2 -norm guarantees the coeffi-
number is obtained by analyzing the plot of log i 1TN ui
cient matrix is both sparse and block diagonal and can also
against i. The logarithm value rather than the direct value of
 2 minimize the between-cluster similarity for correct segmenta-
i 1TN ui is used to better explain large shifts of the plot at tion of band vectors. 2) The AS measurement improves the
i = 1 and also to smooth the DC data. The logarithm plot is performance of the similarity matrix constructed with the DGC
additionally smoothed using the average filter of size 3 since measurement. The AS measurement considers the overall sim-
the log-likelihood function is sensitive to variance in the data. ilarity between two sparse coefficient band vectors rather than
The number corresponding to the elbow of the logarithm plot only the similarity between isolated sparse coefficients. 3) We
is then selected as the appropriate number k. use the DC plot algorithm to estimate an appropriate number k
Given the appropriate cluster size k of band vectors, spec- of band clusters to avoid errors caused from variance in data.
tral clustering then implements band selection in the following The process of band selection using the ISSC is shown in Fig. 1
four stages: 1) The symmetric normalized Laplacian matrix and is implemented in the following five steps.
Lsym is built from the similarity matrix W using Lsym = 1) The HSI imagery is transformed from a data cube into
D1/2 WD1/2 , where D is a diagonal matrix constructed two-dimensional (2-D) band dataset Y RDN , where
with the diagonal entries of W. 2) The first k eigenvectors D and N are the dimension of band vectors (i.e., the
Uk = [u1 u2 uk ] RN k are found through a singular number of pixels) and the number of bands, respectively.
value decomposition of the Laplacian matrix Lsym and each 2) Sparse representations of band vectors are constructed
row vector Hi of Uk is normalized to norm 1 using Hij = using the LSR algorithm by solving the optimization
 2 1/2
uij / . 3) Row vectors in normalized Uk are clus- problem (10) and sparse coefficient matrix Z of band
k uik
tered into k clusters using the K-means algorithm, and the vectors is found.
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2789

TABLE I
C ONTRAST IN C OMPUTATIONAL C OMPLEXITY B ETWEEN ISSC AND S IX
OTHER BAND S ELECTION M ETHODS

Fig. 2. Image of Indian Pines dataset.


3) The similarity matrix W is found with the AS mea-
surement in (12), where each entry of W represents HSI datasets Indian Pines, Pavia University (PaviaU), and
the similarity between pairwise sparse coefficient band Urban datasets. Section IV-B lists the detailed results from the
vectors. four groups of experiments, and Section IV-C analyzes and
4) The appropriate number k of band clusters is estimated discusses the experimental results from Section IV-B.
using the DC plot algorithm in (13). Spectral clustering is
then used to segment the similarity matrix into separate
clusters. A. Descriptions of Three HSI Datasets
5) The bands nearest to the centroid of their cluster are then The first dataset is Indian Pines from the Multispectral
the desired band subset using ISSC. Image Data Analysis System Group at Purdue University
In ISSC, the computational complexity of sparse representa- (https://engineering.purdue.edu/~biehl/MultiSpec/aviris_docum
tions of all band vectors using LSR is O(N 2 D), the computa- entation.html). The dataset was acquired by NASA on June
tional complexity in constructing the angular similarity matrix 12, 1992 using the AVIRIS sensor from JPL. The dataset
is O(N 2 ), and the computational complexity of clustering has 20 m spatial resolutions and 10 nm spectral resolutions
bands with the appropriate number is O(kN t), where D and N covering a spectrum range of 2002400 nm. A subset of the
are the dimension of the band vectors and the number of bands, image scene of size 145 145 pixels depicted in Fig. 2 is
respectively, and k and t represent the number of band clus- used in our experiment and covers an area of 6 miles west of
ters and iterations, respectively. Therefore, the total complexity West Lafayette, Indiana. The dataset was preprocessed with
of ISSC roughly equals O(N 2 D + N 2 + kN t). Considering radiometric corrections and bad band removal, and afterward
the fact that O(N 2 D) dominates O(N 2 ) and O(kN t), the 200 bands remained with calibrated data values proportional
complexity of ISSC is approximately O(N 2 D). We com- to radiances. Sixteen classes of ground objects exist in the
pare the complexity of ISSC with six other state-of-the-art image scene, and the ground truth for both training and testing
band selection methods. We test two MIMC approaches, linear samples for each class is listed in Table II.
constrained minimum variance-based band correlation con- The second is Pavia University (PaviaU) dataset taken from
straint (LCMV-BCC) [17] and AP [52], two MIS approaches, the Computational Intelligence Group in the Basque University
SID [20] and maximum-variance principal component analy- (http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote
sis (MVPCA) [53], and two SB approaches, SpaBS [27] and _Sensing_Scenes). The dataset was achieved from ROSIS sen-
SNMF [30]. The comparison in computational complexity of all sor with 1.3 m spatial resolutions and 115 bands. After
seven methods is listed in Table I, where K denotes the sparsity removing low signal to noise ratio (SNR) bands, 103 bands
level of SpaBS. Since N 2 < D, N 2 D < D2 << D2 N , and were left for further analysis. The small subset of the larger
K < k << N << D, we deduce that the computational com- dataset shown in Fig. 3 is used in our experiments. The test
plexity of ISSC is lower than that of AP, MVPCA, SpaBS, and dataset contains 350 340 pixels and covers the area of Pavia
LCMV-BCC but higher than that of SID. Moreover, the SpaBS University. The image scene has nine classes of ground objects
has the highest computational complexity among all the meth- including shadows, and the ground truth information of training
ods, whereas LCMV-BCC has the second highest computa- and testing samples in each class is listed in Table III.
tional complexity. In addition, the computational complexity of The third dataset is the Urban dataset acquired from the U.S.
AP is higher than that of the other four methods SID, MVPCA, Army Geospatial Center (www.tec.army.mil/hypercube). The
SNMF, and ISSC. In summary, ISSC has a lower computational dataset was collected by a HYDICE sensor with 10 nm spec-
complexity among all seven band selection methods. tral resolution and 2 m spatial resolutions. The low SNR band
sets [14, 76, 87, 101111, 136153, 198210] were eliminated
from the initial 210 bands, leaving the final 162 bands. Fig. 4
IV. E XPERIMENTAL R ESULTS AND A NALYSIS shows a small image subset of size 307 307 pixels selected
In this section, four groups of experiments on three famous from the larger image. The small dataset covers an area at
HSI datasets are implemented in order to testify our pro- Copperas Cove near Fort Hood, TX, USA, and has 22 classes of
posed ISSC method when selecting an appropriate band subset. ground objects in the image scene. Table IV shows the ground
Section IV-A describes the relevant information of the three truth information for training and testing samples in each class.
2790 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

TABLE II
G ROUND T RUTH OF T RAINING AND T ESTING S AMPLES IN E ACH C LASS FOR I NDIAN P INES DATASET

between the scalar parameter in ISSC and the classifica-


tion accuracies. The experiment helps to determine a proper
when using ISSC in real-world applications of HSI classifica-
tion. The following experimental results without specific nota-
tions are the average results of ten different and independent
experiments.
1) Quantitative Evaluation of the ISSC Band Subset: The
experiment evaluates the band selection results obtained before
classification from ISSC and other six methods using three
quantitative measures. We use the average information entropy
(AIE) to measure the information amount and to evaluate the
richness of spectrum information in the band subset. We com-
pute the average correlation coefficient (ACC) to estimate the
intra-band correlations in the band subset. The average relative
Fig. 3. Image of PaviaU dataset.
entropy (ARE) (also called average KullbackLeibler diver-
gence, AKLD) is used to measure the inter-separabilities of
selected bands and assess the distinguishability within the band
B. Experimental Results subset for classification. We use the three above quantitative
We conduct four groups of experiments using three HSI measures because they measure the three desired performance
datasets above to test the performance of our ISSC method characteristic of selecting an appropriate band subset having
in selecting an appropriate band subset for classification. Six high information amount, low intra-band correlations, and high
state-of-the-art methods in band selection are used to make inter-separabilities in the band subset. In the experiment, the
holistic comparisons with our methods, including two MIMC appropriate size of band subset k (i.e., the number of bands in
approaches LCMV-BCC and AP, two MIS approaches SID and the subset) is estimated using the DC plot algorithm and is then
MVPCA, and two SB approaches SpaBS and SNMF. First, we set as the dimension of band subset for all the methods. The k in
quantify the band selection performance of ISSC and compare Indian Pines dataset is 12, the k in PaviaU dataset is 10, and the
the results with those of the other six methods. The experi- k in Urban dataset is 20. In the SNMF method, the parameter
ment assesses the performance of the ISSC in band selection that controls the entry size of dictionary matrix and the param-
before classification. Second, we compare the computational eter that determines the sparseness of coefficient matrix are
time of ISSC against six other band selection methods when determined using cross-validation and the optimal and hav-
varying the size of band subset k. The experiment investigates ing the best result are selected. For the Indian Pines dataset,
the computational performance of ISSC. Third, we compare the and of SNMF are chosen as 3.0 and 0.05, respec-
the classification accuracies of ISSC against that of the six tively, and the scalar parameter in ISSC chosen to be 0.1
other methods. Two widely used classifiers are used in the after cross-validation. In the PaviaU dataset, the and are
experiment, K-nearest neighbor (KNN) [54] and support vector 4.0 and 0.001, respectively, and the is 0.001. The , , and
machine (SVM) [55] classifiers. The overall classification accu- in Urban dataset are 3.5, 0.01, and 0.05, respectively. The iter-
racy (OCA) and average classification accuracy (ACA) is used ation time t for the learning dictionary in SpaBS is manually
to measure the classification performance of all seven meth- set as 5 for all three datasets. Table V lists detailed information
ods in the experiment. The KNN classifier uses the Euclidean about the parameters of the above seven methods on the three
distance, and the SVM classifier uses the radial basis func- HSI datasets.
tion (RBF) kernel function with the variance parameter and Table VI illustrates the quantitative evaluation results of all
the penalization factor obtained via cross-validation. For each the methods on the three HSI datasets. For the Indian Pines
dataset, we repeatedly subsample the training samples and test- dataset, the ISSC has the highest AIE and ARE, whereas SNMF
ing samples ten times. Finally, we investigate the relationships has the second lowest ACC since the ACC of SNMF. The
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2791

TABLE III
G ROUND T RUTH OF T RAINING AND T ESTING S AMPLES IN E ACH C LASS FOR PAVIAU DATASET

ISSC, and the computational time of SNMF is longer than those


of MVPCA, ISSC, and SID. The MVPCA has longer computa-
tional time than SID and ISSC. The computational time of SID
is shorter than that of ISSC and has the shortest computational
time among all the methods. We found that these computational
times coincide with the analysis of computational complexity
of all the methods in Section III-D. The computational times in
increasing order of all the seven methods are as follows: SID,
ISSC, MVPCA, SNMF, AP, LCMV-BCC, and SpaBS.
3) Classification Performance of ISSC: This experiment
measures the classification performance of the ISSC method.
Fig. 4. Image of Urban dataset. Our aim is to make holistic evaluations in classification perfor-
mance by varying the size of band subset k rather than using a
certain predefined band number. As we did in the above exper-
SID and MVPCA behave worse due to their higher ACC and iments, we compare classification accuracies using the OCA
lower AIE and ARE. For the PaviaU dataset, the ISSC performs and ACA. For each dataset, we repeatedly subsample the train-
best among all the methods for all three quantitative measures. ing samples and testing samples ten times to achieve accurate
When comparing four other methods with ISSC excluded, the classification accuracies. In the experiment, the size of band
SID and MVPCA show higher ACC and lower AIE and ARE; subset k in Indian Pines dataset varies from 2 to 45 with a step
therefore, the band subset of the two methods have poorer interval of 2, and the size of band subset k in PaviaU dataset
performance than that of the two other methods. The ISSC per- and Urban datasets vary between 2 and 50 with a step interval
formance in the Urban dataset has the highest AIE and lowest of 2. The neighbor size k1 in the KNN classifier is set as 3, and
ACC, and ARE of the ISSC is the second only to that of SNMF. the threshold of total distortion in the SVM classifier is set as
The SID performance on the Urban dataset was similar to its 0.01. Using cross-validation, the and in SNMF of PaviaU
performance on the Pavia U dataset. It again performed worse dataset are chosen as 3.0 and 0.1, respectively, the and in
than the other methods in the three quantitative measures. Urban dataset are chosen as 4.0 and 1.5, respectively; and the
2) Computational Performance of ISSC: The experiment and in Urban dataset are chosen as 3.5 and 0.05, respectively.
tests the computational speed performance of ISSC and the Other parameters are the same as their counterparts in the above
other six methods when varying the sizes of band subset k. For experiments. Table V details parameter configurations of all the
the Indian pines dataset, k is set between 6 and 24 with a step methods.
interval of 6; k in PaviaU dataset is set between 5 and 25 with Fig. 5 illustrates the OCA results of all seven methods using
a step interval of 5; and k in Urban dataset is set between 10 the SVM and KNN classifiers on the three datasets. We did not
and 40 with a step interval of 10. Other parameters in all the list the ACA results because of the similarity between the ACA
seven methods are the same as their counterparts in the previ- curve and the OCA curve. For each dataset and each classifier,
ous experiment. Table V details parameter configurations of all the OCA is small for band number k less than 5 and the OCA
the methods. rises with increasing k. The OCA changes slowly after a cer-
We run the experiment on a Windows 7 computer with Inter tain threshold of the band number k and most curves become
i7-4700 Quad Core Processor and 16 GB of RAM. Both the almost flat with slight fluctuations. The SID with each classi-
ISSC and the other six methods algorithms are implemented in fier and each HSI dataset always has the lowest value among all
MATLAB 2014a. Table VII shows the comparisons in the com- the methods. For each dataset using KNN and SVM classifiers,
putations times of the seven methods on the three HSI datasets. ISSC outperforms the other six methods after a certain thresh-
For each HSI dataset, the computational times of all the meth- old k. The OCA results of ISSC are the best among all the seven
ods gradually increase with increasing k. The SpaBS takes methods.
the longest computational times among all the methods. The Moreover, we found that the threshold k for the ISSC curves
LCMV-BCC has a shorter computational time than SpaBS but of each dataset is located around the appropriate band num-
is still slower than the other five methods. The computational ber estimated by the DC plot algorithm. Therefore, using the
time of AP is longer than those of MVPCA, SID, SNMF, and appropriate band number k from the DC plot algorithm, we
2792 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

TABLE IV
G ROUND T RUTH OF T RAINING AND T ESTING S AMPLES IN E ACH C LASS FOR U RBAN DATASET

TABLE V
L ISTS OF PARAMETERS IN A LL THE E XPERIMENTS ON THE T HREE HSI DATASETS

TABLE VI
C ONTRAST IN Q UANTITATIVE E VALUATION OF BAND S UBSETS FROM A LL S EVEN M ETHODS ON THE T HREE DATASETS
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2793

TABLE VII
C OMPUTATIONAL T IMES OF S EVEN BAND S ELECTION M ETHODS U SING D IFFERENT C HOICES OF K ON THREE HSI DATASETS

Fig. 5. OCA results of all the seven methods on the three HSI datasets. (a), (c), and (e): SVM; (b), (d), (f): KNN.

make detailed comparisons in the OCA and ACA results of that the SID has the worst performance of all the method. The
all the methods on the three datasets. As in experiment 1), the above observations further support the results in Fig. 5.
appropriate band number k in Indian Pines dataset is 12, the 4) Effect From the Scalar Parameter on the Sensitivity of
k in PaviaU dataset is 10, and the k in Urban dataset is 20. Classification Accuracy: The experiment explores the effect
Table VIII shows that the ISSC has the best classification accu- from the scalar parameter in ISSC on the OCA and ACA
racies for all three datasets using different classifiers, that the results of three HSI datasets when varying from smaller to
AP behaves better than SID and MVPCA in classification, and larger values. Since the spectral values have similar magnitudes
2794 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

TABLE VIII
C LASSIFICATION ACCURACIES OF A LL THE M ETHODS U SING AN A PPROPRIATE S IZE OF BAND S UBSET ON T HREE DATASETS

TABLE IX
E FFECT F ROM S CALAR PARAMETER IN ISSC ON C LASSIFICATION ACCURACIES OF THE T HREE DATA HSI DATASETS

in all three datasets, we vary the scalar parameter in all three classification. ISSC is compared against the six popular band
datasets from 0.001 and 100, and choose the test candidates selection methods LCMV-BCC, AP, SID, MVPCA, SpaBS, and
from the set [0.0001, 0.001, 0.01, 0.1, 0.5, 1, 5, 10, 50, 100]. SNMF. The three quantitative measures of AIE, ACC, and ARE
We did not investigate the effect of in classification with show that the ISSC has the best performance among all seven
a certain step interval value because the scale of the param- methods. The ISSC assumes that all bands lie in a union of
eter is too large to make detailed analysis. As in experiment subspaces and that each band can be sparsely represented with
3), the sizes of the ISSC band subset on the three datasets are other bands from its subspace. The sparse and block diagonal
estimated using the DC plot algorithm. The band number k in coefficient matrix found by solving the l2 -norm optimization
Indian Pines dataset is 12, the k in PaviaU dataset is 10, and problem with LSR shows a grouping effect. Furthermore, the
the k in Urban dataset is 20. Other parameter configurations optimization solution can guarantee subspace segmentation of
in the experiment are the same as their counterparts in exper- band vectors. The ISSC band subset has high intra-band sepa-
iment 3) and the detailed parameter settings in the experiment rabilities and low intra-band correlations and each band image
are listed in Table V. has high information amount. The ISSC satisfies the demands
Table IX shows the OCA and ACA results of ISSC using of band subset selection and hence is an appropriate method for
different classifiers and different HSI datasets. For all three selecting an appropriate band subset for classification. In con-
datasets, we observe that when increasing the scalar parame- trast, the SID and MVPCA perform worse in the quantitative
ter , the classification accuracies of ISSC gradually decrease evaluations and are poor choices for band selection.
with small fluctuations. Moreover, the OCA and ACA decrease The experiment in computational performance shows that the
quickly for between 0.0001 and 1, whereas the decline in ISSC has shorter computational times than those of LCMV-
OCA and ACA is much slower when is from 1 to 100. BCC, AP, MVPCA, SpaBS, and SNMF. The SID has the
This implies the scalar parameter has a great effect on shortest computational times of all. The speed advantage of
classification accuracy of the ISSC band subset for HSI dataset. SID is because SID only computes the diagonal entries of
the similarity matrix. The AP has longer computational times
than SID because it computes the entire similarity matrix of
C. Analysis and Discussion band vectors rather than only computing the diagonal elements
The above four groups of experiments on the three of the similarity matrix. The longer computational times of
HSI datasets test the performance of our ISSC method in MVPCA are due to the computation in principal component
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2795

analysis (PCA) transformation of HSI dataset. The lowest try the l0 -norm for the ISSC hoping to further ameliorate and
computational speeds in SpaBS results from the huge com- improve its classification performance.
putational complexity of dictionary learning using the K-SVD
algorithm [56].
The experiment in classification performance compares clas- ACKNOWLEDGMENT
sification accuracies of ISSC against those of six other methods The authors would like to thank the editor and referees for
(LCMV-BCC, AP, SID, MVPCA, SpaBS, and SNMF). The their suggestions which improved this paper.
results show that SID has the worst performance in classi-
fication although its computational times are shortest. That
coincides with the conclusions in quantitative evaluations of R EFERENCES
SID. Fortunately, when comparing the ACA and OCA of ISSC
[1] B. Du and L. Zhang, A discriminative metric learning based anomaly
against those of other methods, ISSC performs better than the detection method, IEEE Trans. Geosci. Remote Sens., vol. 52, no. 11,
other six methods when using a cluster size obtained from the pp. 68446857, Nov. 2014.
DC plot algorithm. This implies the DC plot algorithm finds an [2] R. A. Garcia, P. R. Fearns, and L. I. McKinna, Detecting trend and
seasonal changes in bathymetry derived from HICO imagery: A case
appropriate band number for selecting an appropriate band sub- study of Shark Bay, Western Australia, Remote Sens. Environ., vol. 147,
set and guarantee good performance of ISSC in HSI dataset pp. 186205, 2014.
classification. Finally, the experiment on the effect from the [3] L. Demarchi et al., Assessing the performance of two unsupervised
dimensionality reduction techniques on hyperspectral APEX data for high
scalar parameter on the classification sensitivity of ISSC resolution urban land-cover mapping, ISPRS J. Photogramm. Remote
shows that the ACA and OCA of ISSC decrease with increas- Sens., vol. 87, pp. 166179, 2014.
ing . Therefore, a smaller scalar parameter is appropriate [4] M. Alonzo, B. Bookhagen, and D. A. Roberts, Urban tree species map-
ping using hyperspectral and lidar data fusion, Remote Sens. Environ.,
for the ISSC when selecting an appropriate band subset, since vol. 148, pp. 7083, 2014.
smaller parameter produce higher classification accuracies [5] W. D. Hively et al., Use of airborne hyperspectral imagery to map soil
and shorter computational times. properties in tilled agricultural fields, Appl. Environ. Soil Sci., vol. 2011,
pp. 113, 2011.
[6] I. Herrmann et al., Ground-level hyperspectral imagery for detecting
weeds in wheat fields, Precis. Agric., vol. 14, no. 6, pp. 637659, 2013.
V. C ONCLUSION AND F UTURE W ORK [7] R. J. Murphy and S. T. Monteiro, Mapping the distribution of ferric iron
minerals on a vertical mine face using derivative analysis of hyperspectral
This paper proposes the ISSC method to select an appropriate imagery (430970 nm), ISPRS J. Photogramm. Remote Sens., vol. 75,
band subset from the HSI dataset. The ISSC assumes that each pp. 2939, 2013.
band is drawn from a union of low-dimensional subspaces and [8] N. Zabcic et al., Using airborne hyperspectral data to characterize the
surface pH and mineralogy of pyrite mine tailings, Int. J. Appl. Earth
each band can be sparsely represented by other bands in its sub- Observ. Geoinf., vol. 32, pp. 152162, 2014.
space. Our algorithm constructs a similarity matrix with sparse [9] D. Scott, The curse of dimensionality and dimension reduction, in
coefficient band vectors. The appropriate band subset is then Multivariate Density Estimation: Theory, Practice, and Visualization.
Hoboken, NJ, USA: Wiley, 1992, pp. 195217.
selected from band clusters using the similarity matrix with [10] A. Plaza et al., Dimensionality reduction and classification of hyperspec-
an appropriate band number found by the DC plot algorithm. tral image data using sequences of extended morphological transforma-
Four groups of experiments are designed and implemented to tions, IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 466479,
Mar. 2005.
completely investigate the performance of ISSC. First, the band [11] P. H. Hsu, Feature extraction of hyperspectral images using wavelet and
subset chosen by ISSC is quantitatively evaluated using the matching pursuit, ISPRS J. Photogramm. Remote Sens., vol. 62, no. 2,
AIE, ACC, and ARE measures. Its performance is compared pp. 7892, 2007.
[12] M. Pal and G. M. Foody, Feature selection for classification of hyper-
against the performance of the six popular methods LCMV- spectral data by SVM, IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5,
BCC, AP, MVPCA, SID, SpaBS, and SNMF. The results show pp. 22972307, May 2010.
that the ISSC band subset contains higher information amount, [13] W. Sun et al., Nonlinear dimensionality reduction via the ENH-LTSA
method for hyperspectral image classification, IEEE J. Sel. Topics Appl.
lower intra-band correlations, and higher intra-band separabil- Earth Observ. Remote Sens., vol. 7, no. 2, pp. 375388, Feb. 2014.
ities. Second, the experiment in computational performance [14] E. Arzuaga-Cruz, L. O. Jimenez-Rodriguez, and M. Velez-Reyes,
illustrates that the ISSC has a shorter computational time than Unsupervised feature extraction and band subset selection techniques
based on relative entropy criteria for hyperspectral data analysis,
do the other five methods (LCMV-BCC, AP, MVPCA, SpaBS, AeroSense, vol. 2003, pp. 462473, 2003.
and SNMF). Third, the experiment in classification perfor- [15] P. Bajcsy and P. Groves, Methodology for hyperspectral band selection,
mance shows that classification accuracy of ISSC measured by Photogramm. Eng. Remote Sens., vol. 70, no. 7, pp. 793802, 2004.
[16] B. Guo et al., Band selection for hyperspectral image classification using
ACA and OCA is better than the six other methods when using mutual information, IEEE Geosci. Remote Sens. Lett., vol. 3, no. 4,
an appropriate number found by the DC plot algorithm. In short, pp. 522526, Oct. 2006.
the ISSC achieves the best classification performance with the [17] C.-I. Chang and S. Wang, Constrained band selection for hyperspectral
imagery, IEEE Trans. Geosci. Remote Sens., vol. 44, no. 6, pp. 1575
second shortest computational time. In contrast, the SID proves 1585, Jun. 2006.
to be worse both in quantitative evaluation results and in classi- [18] A. Martnez-Us et al., Clustering-based hyperspectral band selection
fication accuracy. Finally, the experiment on the effect from the using information measures, IEEE Trans. Geosci. Remote Sens., vol. 45,
no. 12, pp. 41584171, Dec. 2007.
scalar parameter on the sensitivity of classification accuracy [19] S. Jia et al., Unsupervised band selection for hyperspectral imagery clas-
of HSI dataset shows that a choice of a small scalar parameter sification without manual band removal, IEEE J. Sel. Topics Appl. Earth
leads to accurate classification. In the future, we will test our Observ. Remote Sens., vol. 5, no. 2, pp. 531543, Apr. 2012.
[20] P. Mausel, W. Kramber, and J. Lee, Optimum band selection for super-
ISSC method against more HSI datasets to further understand vised classification of multispectral data, Photogramm. Eng. Remote
its performance in real-world applications. Moreover, we will Sens., vol. 56, pp. 5560, 1990.
2796 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

[21] Q. Du and H. Yang, Similarity-based unsupervised band selection for [47] S. Still and W. Bialek, How many clusters? An information-theoretic
hyperspectral image analysis, IEEE Geosci. Remote Sens. Lett., vol. 5, perspective, Neural Comput., vol. 16, no. 12, pp. 24832506, 2004.
no. 4, pp. 564568, Oct. 2008. [48] R. Tibshirani, G. Walther, and T. Hastie, Estimating the number of clus-
[22] W. Xia, B. Wang, and L. Zhang, Band selection for hyperspectral ters in a data set via the gap statistic, J. Roy. Stat. Soc. B, vol. 63, no. 2,
imagery: A new approach based on complex networks, IEEE Geosci. pp. 411423, 2001.
Remote Sens. Lett., vol. 10, no. 5, pp. 12291233, Sep. 2013. [49] M. Honarkhah and J. Caers, Classifying existing and generating new
[23] Q. Du, Band selection and its impact on target detection and classifica- training image patterns in kernel space, in Proc. 21st Stanford Center
tion in hyperspectral image analysis, in Proc. IEEE Workshop Adv. Tech. Reservoir Forecast. Aliate Meeting (SCRF), Stanford University, CA,
Anal. Remotely Sens. Data, Greenbelt, MD, USA, Oct. 2728, 2003, USA, 2008, pp. 138.
pp. 374377. [50] N. Bassiou, V. Moschou, and C. Kotropoulos, Speaker diarization
[24] H. Yang et al., An efficient method for supervised hyperspectral band exploiting the eigengap criterion and cluster ensembles, IEEE Trans.
selection, IEEE Geosci. Remote Sens. Lett., vol. 8, no. 1, pp. 138142, Audio Speech Lang. Process., vol. 18, no. 8, pp. 21342144, Nov. 2010.
Jan. 2011. [51] R. Patil and K. Jondhale, Edge based technique to estimate number of
[25] Y. Zhang, B. Du, and L. Zhang, A sparse representation-based binary clusters in k-means color image segmentation, in Proc. 3rd IEEE Int.
hypothesis model for target detection in hyperspectral images, IEEE Conf. Comput. Sci. Inf. Technol. (ICCSIT), Beijing, China, Jul. 911,
Trans. Geosci. Remote Sens., vol. 53, no. 3, pp. 13461354, Mar. 2015. 2010, pp. 117121.
[26] Y. Chen, N. M. Nasrabadi, and T. D. Tran, Hyperspectral image classifi- [52] M. Honarkhah and J. Caers, Stochastic simulation of patterns using
cation using dictionary-based sparse representation, IEEE Trans. Geosci. distance-based pattern modeling, Math. Geosci., vol. 42, no. 5, pp. 487
Remote Sens., vol. 49, no. 10, pp. 39733985, Oct. 2011. 517, 2010.
[27] Y. Chen, N. M. Nasrabadi, and T. D. Tran, Sparse representation for [53] Y. Qian, F. Yao, and S. Jia, Band selection for hyperspectral imagery
target detection in hyperspectral imagery, IEEE J. Sel. Topics Signal using affinity propagation, IET Comput. Vis., vol. 3, no. 4, pp. 213222,
Process., vol. 5, no. 3, pp. 629640, Jun. 2011. Dec. 2009.
[28] S. Li and H. Qi, Sparse representation based band selection for hyper- [54] C.-I. Chang et al., A joint band prioritization and band-decorrelation
spectral images, in Proc. 18th IEEE Int. Conf. Image Process. (ICIP), approach to band selection for hyperspectral image classification, IEEE
Brussels, Belgium, Sep. 1114, 2011, pp. 26932696. Trans. Geosci. Remote Sens., vol. 37, no. 6, pp. 26312641, Nov. 1999.
[29] J. M. Li and Y. T. Qian, Clustering-based hyperspectral band selection [55] T. Cover and P. Hart, Nearest neighbor pattern classification, IEEE
using sparse nonnegative matrix factorization, J. Zhejiang Univ. Sci. C, Trans. Inf. Theory, vol. 13, no. 1, pp. 2127, Jan. 1967.
vol. 12, no. 7, pp. 542549, 2011. [56] I. Steinwart and A. Christmann, Support Vector Machines. Berlin,
[30] Q. Du, J. M. Bioucas-Dias, and A. Plaza, Hyperspectral band selection Germany: Springer-Verlag, 2008.
using a collaborative sparse model, in Proc. IEEE Int. Geosci. Remote [57] R. Rubinstein, M. Zibulevsky, and M. Elad, Efficient implementation
Sens. Symp. (IGARSS), Munich, Germany, Jul. 2227, 2012, pp. 3054 of the K-SVD algorithm using batch orthogonal matching pursuit, CS
3057. Technion, vol. 40, pp. 115, 2008.
[31] S. Chepushtanova, C. Gittins, and M. Kirby, Band selection in hyper-
spectral imagery using sparse support vector machines, in Proc. SPIE
DSS Conf., 2014, p. 90881F.
[32] E. Elhamifar and R. Vidal, Sparse subspace clustering, in Proc. IEEE
Conf. Comput. Vis. Pattern Recog. (CVPR09), Miami, FL, USA, Jun.
2026, 2009, pp. 27902797.
[33] E. Elhamifar and R. Vidal, Sparse subspace clustering: Algorithm, the- Weiwei Sun received the B.S. degree in survey-
ory, and applications, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, ing and mapping and the Ph.D. degree in cartog-
no. 11, pp. 27652781, Nov. 2013. raphy and geographic information engineering from
[34] U. Von Luxburg, A tutorial on spectral clustering, Statist. Comput., Tongji University, Shanghai, China, in 2007 and
vol. 17, no. 4, pp. 395416, 2007. 2013, respectively.
[35] G. H. Mohimani, M. Babaie-Zadeh, and C. Jutten, Fast sparse rep- From 2011 to 2012, he was with the Department
resentation based on smoothed L0 norm, in Independent Component of Applied Mathematics, University of Maryland
Analysis and Signal Separation. New York, NY, USA: Springer, 2007, College Park, College Park, MD, USA, working
pp. 389396. as a Visiting Scholar. He is currently an Assistant
[36] K. Koh, S.-J. Kim, and S. P. Boyd, An interior-point method for large- Professor with Ningbo University, Ningbo, China,
scale L1-regularized logistic regression, J. Mach. Learn. Res., vol. 8, and is also working as a Postdoc with the State Key
no. 8, pp. 15191555, 2007. Laboratory for Information Engineering in Surveying, Mapping, and Remote
[37] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition Sensing (LIESMARS), Wuhan University, Wuhan, China. He has authored
by basis pursuit, SIAM J. Sci. Comput., vol. 20, no. 1, pp. 3361, 1999. more than 20 journal papers. His research interests include hyperspectral image
[38] J. Yang and Y. Zhang, Alternating direction algorithms for L1-problems processing with manifold learning, anomaly detection, and target recognition
in compressive sensing, SIAM J. Sci. Comput., vol. 33, no. 1, pp. 250 of remote sensing imagery using compressive sensing.
278, 2011.
[39] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy.
Stat. Soc. B, vol. 58, no. 1, pp. 267288, 1996.
[40] H. Zou and T. Hastie, Regularization and variable selection via the
elastic net, J. Roy. Stat. Soc. B, vol. 67, no. 2, pp. 301320, 2005. Liangpei Zhang (M06SM08) received the B.S.
[41] S. Wang et al., Efficient subspace segmentation via quadratic program- degree in physics from Hunan Normal University,
ming, in Proc. 25th AAAI Conf. Artif. Intell., San Francisco, CA, USA, Changsha, China, in 1982, the M.S. degree in optics
Aug. 711, 2011, pp. 519524. from the Xian Institute of Optics and Precision
[42] C.-Y. Lu et al., Robust and efficient subspace segmentation via least Mechanics, Chinese Academy of Sciences, Xian,
squares regression, in Proc. 12th Eur. Conf. Comput. Vis. Part VII, China, in 1988, and the Ph.D. degree in photogram-
Florence, Italy, Oct. 713, 2012, pp. 347360. metry and remote sensing from Wuhan University,
[43] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, Wuhan, China, in 1998.
Maryland: Johns Hopkins University Press, 2012. He is currently the Head of the Remote Sensing
[44] S. Yan and H. Wang, Semi-supervised learning by sparse representa- Division, State Key Laboratory of Information
tion, in Proc. SIAM Int. Conf. Data Min. (SDM09), Sparks, NV, USA, Engineering in Surveying, Mapping, and Remote
Apr. 30/May 2, 2009, pp. 792801. Sensing, Wuhan University. He is also a Chang-Jiang Scholar Chair
[45] Y. Gao, A. Choudhary, and G. Hua, A nonnegative sparsity induced sim- Professor appointed by the Ministry of Education of China. He is currently a
ilarity measure with application to cluster analysis of spam images, in Principal Scientist for the China State Key Basic Research Project (20112016)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Dallas, appointed by the Ministry of National Science and Technology of China to lead
TX, USA, Mar. 1419, 2010, pp. 55945597. the remote sensing program in China. He has authored more than 310 research
[46] S. Wu, X. Feng, and W. Zhou, Spectral clustering of high-dimensional papers. He is the holder of five patents. His research interests include hyper-
data exploiting sparse representation vectors, Neurocomputing, vol. 135, spectral remote sensing, high-resolution remote sensing, image processing, and
pp. 229239, 2014. artificial intelligence.
SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2797

Dr. Zhang is an Executive Member (Board of Governor) of the China Weiyue Li received the B.S. degree in geography
National Committee of International GeosphereBiosphere Programme, science from Shandong Normal University, Jinan,
Executive Member of the China Society of Image and Graphics, etc. He reg- China, in 2006, the M.S. degree in photogramme-
ularly serves as a Co-Chair of the series SPIE Conferences on Multispectral try and remote sensing from Liaoning Technical
Image Processing and Pattern Recognition, Conference on Asia Remote University, Huludao, China, in 2010, and the Ph.D.
Sensing, and many other conferences. He edits several conference proceedings, degree in cartography and geography information
issues, and geoinformatics symposiums. He also serves as an Associate Editor engineering from Tongji University, Shanghai, China,
of the IEEE T RANSACTIONS ON G EOSCIENCE AND R EMOTE S ENSING, in 2014.
International Journal of Ambient Computing and Intelligence, International He is working as an Assistant Researcher with
Journal of Image and Graphics, International Journal of Digital Multimedia the Institute of Urban Studies, Shanghai Normal
Broadcasting, Journal of Geo-spatial Information Science, and Journal of University, Shanghai, China. His research interests
Remote Sensing. include feature extraction of hyperspectral imagery and LiDAR data, and the
hazard analysis of landslide.

Bo Du (M11SM15) received the B.S. degree in


engineering from Wuhan University, Wuhan, China, Yenming Mark Lai received the Ph.D. degree in
in 2005, and the Ph.D. degree in photogrammetry applied mathematics from the University of Maryland
and remote sensing from the State Key Laboratory of College Park, College Park, MD, USA, in 2014.
Information Engineering in Surveying, Mapping, and He is working as a Postdoc with the Institute
Remote sensing, Wuhan University, Wuhan, China, in for Computational Engineering and Sciences (ICES),
2010. University of Texas at Austin, Austin, TX, USA.
He is currently an Associate Professor with the His research interests include manifold learning of
School of Computer, Wuhan University. His research hyperspectral imagery and compressive sensing.
interests include pattern recognition, hyperspectral
image processing, and signal processing.

You might also like