You are on page 1of 5

GRD Journals- Global Research and Development Journal for Engineering | Volume 4 | Issue 5 | April 2019

ISSN: 2455-5703

Survey on Feature Extraction using Neural


Networks to Classify Remote Sensing Images
T. Gladima Nisia Dr. S. Rajesh
Assistant Professor Associate Professor
Department of Information Technology Department of Information Technology
AAA College of Engg & Tech., Sivakasi, Tamil Nadu AAA College of Engg & Tech., Sivakasi, Tamil Nadu

Abstract
Remote Sensing (RS) image classification is one of the key research areas in the image processing field. The main important part
of this classification is the efficient extraction of features from the RS image. The feature extraction process is also a complex
process. In earlier days, there are some kind of features extracted like spectral features. But, while considering the spatial domain
of the RS image, it contains more information than the spectral features. So, spectral features dominated the classification area for
few years. Many researches were conducted to still improve the classification accuracy. Thus, it resulted in the extraction of
features using the different neural networks, which proved to increase the accuracy. This paper surveys and discuss the different
works at different duration carried out by researchers to extract the features using neural networks. Also, this survey provides a
marginal overview for the future research and improvements.
Keywords- Remote Sensing, Feature Extraction, Neural Networks, Spatial Feature, Spectral Feature

I. INTRODUCTION
In the recent years, the classification of remote sensing images is found to be a very attractive field for the researchers. The RS
image contains lots of information in every single pixel. So, using the remote sensing image the land use mapping and land cover
mapping is done. The land cover area is the earth cover which consist of forest, water, bare land, saline land, mountain range etc.
The land use is the land cover area converted into a built environment such as residential buildings, commercial buildings, transport
and agricultural land. To better understand the land cover/land use mapping let us consider the remotely sensed image of a
geographical location. The land cover/ land use has to be identified and classified.
Land-cover and land-use information are required for many different kinds of spatial planning, from urban planning at a
local level up to regional development. They play an important role in agricultural policy making. For proper management of
natural resources, the land-cover data is important. They are increasingly needed for the assessment of impacts of economic
development on the environment. Hence, at various geographical levels they are fundamental for guiding decision making. The
Earth’s surface is changing at different levels namely local, regional, national and global scales.
Land management and land planning needs the current status of the landscape. Understanding current land cover status,
it’s uses, and monitoring the timely changes is responsible for land management. Also, the reason for the changes in the land
condition can be found easily through land cover mapping. Keeping in mind these applications, the classification of RS images
has to be done efficiently. The neural networks are employed for obtaining the features from the RS images which in turn is used
for classifying each and every pixel of the image.
The remaining chapters are organised as follows: The chapter II discusses about some of the neural networks used for
extracting the features from the RS image. The chapter III discusses about the neural networks which are trained layer wise. Chapter
IV provides the conclusion of the study.

II. DEEP NEURAL NETWORK ARCHITECTURE

A. Deep Belief Network


A deep belief network (DBN) is a class of deep neural network introduced by Geoff Hinton [3][4] and his students in 2006. DBN
composed of multiple layers of latent variables ("hidden units"). The network has connections between the layers and there is no
connection between units within each layer. When trained on a set of examples without supervision, a DBN can learn to
probabilistically reconstruct its inputs. The layers then act as feature detectors. After this learning step, a DBN can be further
trained with supervision to perform classification.
DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs)
or autoencoders, where each sub-network's hidden layer serves as the visible layer for the next. This stack of RBMs might end
with a Softmax layer to create a classifier, or it may simply help cluster unlabelled data in an unsupervised learning scenario.
DBN’s hidden layer serves as both input and output to the layers before and after it respectively.

All rights reserved by www.grdjournals.com 41


Survey on Feature Extraction using Neural Networks to Classify Remote Sensing Images
(GRDJE/ Volume 4 / Issue 5 / 007)

1) Training
The training process of a DBN can be divided into two stages: the pre-training stage and the fine-tuning stage. In the pre-training
stage, an unsupervised learning-based training is carried out in the down-up direction for feature extraction, while in the fine-
tuning stage, a supervised learning based up-down algorithm is used. The improved performance of the DBNs can be largely
attributed to the pre-training stage in which the initial weights of the network are learned from the structure of the input data.
Compared with the randomly initialized ones, these weights are closer to the global optima and can therefore bring better
performance.

B. Deep Boltzmann Machine


Deep Boltzmann Machine (DBM) is an unsupervised, probabilistic, generative model with entirely undirected connections between
different layers [5]. It contains visible units and multiple layers of hidden units. Like deep belief networks, DBM’s have the
potential of learning internal representations that become increasingly complex, which is considered to be a promising way of
solving object and speech recognition problems. DBM has nonlinear activation functions. DBM enables to learn complex
relationship between features and high-level representation of features.

1) Training
The hidden units act as latent variables (features) that allow the Boltzmann machine to model distributions over visible state vectors
that cannot be modelled by direct pairwise interactions between the visible units. The learning rule of DBM remains unchanged
even with hidden units. So, it’s possible to learn binary features to obtain higher-order structure in the data.

C. Stacked Autoencoders
The Stacked Autoencoders (SAE) [6] is stacking autoencoders into hidden layers by an unsupervised layer-wise learning algorithm
and then fine-tuned by a supervised method. The working of SAE is of three steps. First, train the first autoencoder by input data
and obtain the learned feature vector. Second, the feature vector of the former layer is used as the input for the next layer, and this
procedure is repeated until the training completes. Third, after all the hidden layers are trained, backpropagation algorithm (BP) is
used to minimize the cost function and update the weights with labelled training set to achieve fine-tuning.

1) Training
The stacked autoencoders use greedy layer-wise training to obtain parameter. To do this, first train the first layer on raw input to
obtain parameters for first layer weight and bias W(1,1),W(1,2),b(1,1),b(1,2). This first layer achieves vector consisting of
activation of the hidden units, A by transforming the raw input. Train the second layer on this vector to obtain parameters W(2,1),
W(2,2),b(2,1),b(2,2) for second layer weight and bias. The same procedure is repeated for the remaining layers, using the output
of each layer as input for the subsequent layer. By doing so, the parameters of each layer are individually trained. After which the
fine-tuning is done using back propagation.

D. Stacked Denoising Autoencoders

Stacked Denoising Autoencoders (SDA) [8] is a denoised autoencoders. It is an autoencoder which has multiple layers except that
it's training is not same as a multi layered NN. It is unsupervised pre-training done layer by layer, as input is fed through. The
input may contain noise. The input is passed through the hidden layer. Output is generated and loss is calculated between the output
and the original input. The process continues until the loss is minimized. Then finally the full data is passed through the network
and the data present in the hidden layer is collected. This is the new input. Now, the collected input is taken and noise is passed to
it and the same procedure is followed thereafter. After the process is done with the last layer, the data collected in the last hidden
layer is now the new data.

1) Training
The network is trained to obtain input from a corrupted version of it. After the completion of pre-training to conduct feature
selection and extraction on the input from the preceding layer, a second stage of supervised fine-tuning can follow. Once the first
k layers are trained, the k+1-th layer can be trained because now it is possible to compute the code or latent representation from
the layer below. Then train the entire network as like training a multilayer perceptron. At this point, only consider the encoding
parts of each auto-encoder. This stage is supervised.

III. LAYER WISE TRAINING FRAMEWORK

A. Restricted Boltzmann Machine


The Restricted Boltzmann Machine (RBM)s [7] are 2-layer neural nets that constitute the building blocks of deep-belief networks.
The first layer of the RBM is input layer and the second is the hidden layer. RBM consists out of one input layer (v1,…,v6), one
hidden layer (h1, h2) and corresponding biases vectors Bias a and Bias b. The absence of an output layer is apparent. There is no
necessity of the existence of output layer since the predictions are made differently. The nodes are connected to each other across
layers, but no two nodes of the same layer are linked. RBM is a stochastic neural network.

All rights reserved by www.grdjournals.com 42


Survey on Feature Extraction using Neural Networks to Classify Remote Sensing Images
(GRDJE/ Volume 4 / Issue 5 / 007)

1) Training
The Gibbs sampler is used to train RBM. Randomly start with any one layer and perform Gibbs sampling to generate data from an
RBM. Once the states of the units in one layer are given, all the units in the other layers will be updated. This update process will
carry on until the equilibrium distribution is reached. Next, the weights within an RBM are obtained by maximizing the likelihood
of this RBM.

B. Autoencoder
Autoencoder (AE) is a simple 3-layer, unsupervised Machine learning algorithm neural network where output units are directly
connected back to input units [1]. Here, the number of hidden units is much less than the number of visible ones. It applies
backpropagation, setting the target values to be equal to the inputs. AE is trained to copy its input to its output. The hidden layer
is used to represent the input. AE is a one-hidden-layer feed-forward neural network similar to the MLP. The difference between
an MLP and an AE is that the aim of the AE is to reconstruct the input, while the purpose of the MLP is to predict the target values
with certain inputs. The numbers of nodes in the input layer and the output layer are identical. In the coding process, the AE first
converts the input vector x into a hidden representation h using a weight matrix ω, then in the decoding process, the AE maps h to
the original input vector to obtain x˜ with another weight matrix ω′. Theoretically, ω′ should be the transpose of ω. Parameter
optimization is adopted to minimize the average reconstruction error. Mean square errors (MSEs) are used to measure the accuracy
of reconstruction.

1) Training
The training process for an AE can also be divided into two stages: the first stage is to learn features using unsupervised learning
and the second is to fine-tune the network using supervised learning. To be specific, in the first stage, feed-forward propagation is
first performed for each input to obtain the output value x˜. Then squared errors are used to measure the deviation of x˜ from the
input value. Finally, the error will be backpropagated through the network to update the weights. In the fine-tuning stage, with the
network having suitable features at each layer, the standard supervised learning method is adopted and the gradient descent
algorithm is used to adjust the parameters at each layer.

C. Convolutional Neural Network


One of the special feed-forward neural networks is the Convolutional Neural Network (CNN) [9]. In the traditional neural network,
the neurons are 1-D. In the convolutional neural network, the layers are 3-dimension, which are height, width and depth. In order
to reduce the number of parameters the CNN has two important concepts, locally connected and parameters sharing. The
architecture of CNN is as shown in figure 1.

Fig. 1: Architecture of CNN

There are three main types of layers to build CNN architectures: (1) the convolutional layer, (2) the pooling layer, and (3)
the fully-connected layer. The fully-connected layer is like the regular neural networks. And the convolutional layer is to perform
convolution many times. The pooling layer can be though as downsampling by the maximum of each 2 x 2 block of the previous
layer.

D. Locally Connected Network


In image processing, the information of an image is the pixel. But if the fully connected network is used like before, it may result
in too many parameters. The 512 x 512 RGB image will have 512x 512 x 3 = 786432 parameters per neuron. The large number of
the parameters leads to very slow process and would lead to overfitting. After some investigation of images and optical systems,
it is understood that the features in an image are usually local and one just notice the low-level features first in the optical system.
So, it is possible to reduce the full connected network to the locally connected network. It is one of the main ideas in the CNN.
Just like the mostly image processing does, locally connect a square block to a neuron. The block size can be 3 x 3 or 5 x
5 for instance. The block is like a feature window in some image processing tasks. By doing so, the number of parameters can be

All rights reserved by www.grdjournals.com 43


Survey on Feature Extraction using Neural Networks to Classify Remote Sensing Images
(GRDJE/ Volume 4 / Issue 5 / 007)

reduced to very small but it will not lower the performance. To extract more features, connect the same block to another neuron.
The depth in the layers is how many times we connect the same area to different neuron.
The stride means the shifting distance of the window. Let us consider an example, if we use stride 1 and window size 3 x
3 in 7 x 7 x 3 image without zero-pad, there are 5 x 5 x depth neurons in the next layer. If we change the stride 1 to stride 2 and
others remain the same, there are 3 x 3 x depth neurons in the next layer. So, if we use stride s, window size w x w in w x h image,
there are[(W-w)/s+1] x [(H-w)/s+1] x depth neurons in the next layer.

E. Parameters Sharing
For example, there are 32 x 32 x 5 neurons in the next layer with stride 1, window size 5 x 5 and with zero-pad, and the depth is 5.
Each neuron has 5 x 5 x 3 = 75 parameters (or weights). So, there are 75 x 32 x 32 x 5 = 384000 parameters in the next layer. The
idea is to share the parameters in each depth. That is 32 x 32 neurons in each depth use the same parameters. So, there are only 5
x 5 x 3 = 75 parameters in each depth and 75 x 5 = 375 parameters in total. It greatly decreases the number of parameters. By doing
so, the neurons in each depth in the next layer is just like applying convolution to the image.

F. Activation Function
In the traditional neuron model, we often use the sigmoid function for the activation function. Other choices for the activation
function are also available. One of them is Rectified Linear Units (ReLUs). The function is f(x) = max (0, x). Krizhevsky et al. [9]
compared the performances of using the ReLUs function and the sigmoid function as the activation function in CNNs. The ReLU
model needs less iteration time with the same training error rate.

G. Pooling Layer
Although locally connected networks and parameter sharing are used, there are still many parameters in the neural networks.
Compared with a relatively small dataset, it might cause overfitting. So, often the pooling layers is inserted to the networks. It can
progressively reduce the number of parameters and hence the computation time in the networks. It operates on depth of every
previous layer. It means that the depth of the next layer is the same as that of the previous layer. Also, the number of pixels can be
set when we move the window, or stride, as the convolutional layer. The pooling is explained with the Figure 2.

Fig. 2: A simple example of the pooling layer

Note that there are two type of pooling layers. If the window size equal to stride, it is traditional pooling. If the window
size is larger than the stride, then it is overlapping pooling. In practice, the window size 2  2 and the stride size 2 is used in the
traditional pooling and use the window size 3  3 and the stride size 2 in the overlapping pooling. In additional to max pooling,
other functions can also be used. For example, to calculate the average of the window represent the value of the next layer, which
is called average pooling, and use L2-norm, which is called L2-norm pooling.

IV. CONCLUSION
The paper thus discusses different neural networks used previously for the extraction of features from the remote sensing image.
The paper also discusses how the individual neurons are trained and how the layer wise training happens for the different neural
networks. Every network has their own advantages and disadvantages. After a long and tedious effort by various researchers it is
found that the CNN works better with the extraction of features. But, still even the CNN have some practical disadvantages. So,
these issues have to be handled and overcome in future.

REFERENCES
[1] G. E. Hinton and R. S. Zemel, “Autoencoders, minimum description length, and helmholtz free energy,” Advances in neural information processing systems,
pp. 3, 1994.
[2] G.E. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, July 2006.
[3] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. 1527–1554, 2006.
[4] Mohamed,A.,Dahl, G.,Hinton, G. “Deep belief networks for phone recognition,” Proc.NIPS Workshop, Dec. 2009.

All rights reserved by www.grdjournals.com 44


Survey on Feature Extraction using Neural Networks to Classify Remote Sensing Images
(GRDJE/ Volume 4 / Issue 5 / 007)

[5] Salakhutdinov R and Hinton G. E., “Deep Boltzmann machines”, AISTATS’2009, pp. 448-455, 2009.
[6] Yu Qi, Yueming Wang, Xiaoxiang Zheng, Zhaohui Wu, “Robust feature learning by stacked autoencoder with maximum correntropy criterion ”, IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 11, pp. 3371-3408, 2010.
[7] Hinton G. E., “A practical guide to training restricted Boltzmann machines,” Technical Report UTML TR2010-003, Department of Computer Science,
University of Toronto, 2010.
[8] Vincent P., Larochelle H., Lajoie I., Bengio Y., and Manzagol P., “Stacked denoising autoencoders”, J. Machine Learning Res., vol. 11, pp. 3371-3408, 2010.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, in Proc. Adv. Neural Inf.Process.Syst.
Conf., 2012.

All rights reserved by www.grdjournals.com 45

You might also like