Land Cover Classification Using Hidden Markov Models

N
C
S
C

International Journal of Computer Networks and Communications Security

VOL. 1, NO. 4, SEPTEMBER 2013, 165172
Available online at: www.ijcncs.org
ISSN 2308-9830

Land Cover Classification Using Hidden Markov Models

Dr. GHAYDA A. AL-TALIB
1
and EKHLAS Z. AHMED
2

1
Assist. Prof. Computer Science Department, College of Computer Science and Mathematics, Mosul
University, Mosul, Iraq
2
M. Sc. Student, Computer Science Department, College of Computer Science and Mathematics, Mosul
University, Mosul, Iraq

E-mail:
1
ghaydatalib@yahoo.com,
2
ekh_aa2007@yahoo.com

ABSTRACT

This paper, proposed a classification approach that utilizes the high recognition ability of Hidden Markov
Models (HMM s) to perform high accuracy of classification by exploiting the spatial inter pixels
dependencies ( i.e. the context ) as well as the spectral information. Applying unsupervised classification to
remote sensing images can provide benefits in converting the raw image data into useful information which
achieves high classification accuracy. It is known that other clustering schemes as traditional k-means does
not take into account the spatial inter-pixels dependencies. Experiments work has been conducted on a set
of 10 multispectral satellite images. Proposed algorithm is verified to simulate images and applied to a
selected satellite image processing in the MATLAB environment.
Keywords: Hidden Markov Models (HMM), Land cover, Multispectral Satellite Images, Clustering,
Unsupervised classification.

1 INTRODUCTION

In this paper the Hidden Markov Models (HMM
s) for unsupervised satellite image classification has
been used. An HMMs were extensively and
successfully used for texture modeling and
segmentation (i.e., classification), this is majorly
due to their ability to model contextual
dependencies and noise absorption [1]. The land
cover is an important geospatial variable for
studying human and physical environments and is
increasingly used as input data in spatially explicit
ecological and environmental models, ranging from
global climate change to detailed studies of soil
erosion [2]. During the last decades, new sensors
and easily accessible data archives have increased
the amount of available remote sensing data. This
development necessitates robust, transferable and
automated methods and processing services for
derivation of accurate information that can be
produced within a short period of time [3]. Image
classification refers to the computer-assisted
interpretation of remotely sensed images. Mainly,
there are two ways to do the remote sensing image
classification. One is visual interpretation, and the
other is computer automatic interpretation [4]. The
classification is an important process, which
translates the raw image data into meaningful and
understandable information. The aim of image
classification is to assign each pixel of the image to
a class with regard to a feature space. These
features can be consider the basic image properties
as intensity, amplitude, or some more advanced
abstract image descriptors as textures which can
also be exploited as feature[5]. In this work the
intensity property of the satellite images has been
used to classify the land cover. The computer
automatic classification of remotely sensed imagery
has two general approaches, supervised and
unsupervised classification. The supervised
classification approach usually carries out the
classification of land objects in the image by
establishing study samples of typical land objects,
such as building, water area, vegetation, roads, etc.
Although supervised classification consumes
shorter time [4]. While unsupervised classification
approach automatically cluster the image into a
number of groups according to specific predefined
criterion [6]. Cases of unsupervised classification,
some of the statistical properties of the different
classes are unknown and have to be estimated with
iterative methods such as estimation-maximization
166

Dr. G. A. Al-Talib et al. / International Journal of Computer Networks and Communications Security, 1 (4), September 2013
(EM)[7]. In the first step of this work the model
parameters are set randomly and then will
estimated the optimal values with Baum-Welch
algorithm which is the most widely adopted
methodology for model parameters estimation [8].
After the model parameters are well estimated the
clustering and classification process starts with
Viterbi algorithm. Finally the displaying the
classified image.

2 HIDDEN MARKOV MODELS

An HMM is distinguished from a general Markov
model in that the states in an HMM cannot be
observed directly (i.e. hidden) and can only be
estimated through a sequence of observations
generated along a time series. Assume the total
number of states being N, and let q
t
and o
t
denote
the system state and the observation at time t.
HMM can be formally characterized by = (A, B,
), where A is a matrix of probability transition
between states, B is a matrix of observation
probability densities relating to states, and is a
matrix of initial state probabilities, respectively.
Specifically, A, B, and are each further
represented as follows [6]:
A=[a
ij
], a
ij
= P(q
t+1
=j | q
t
=i ), 1 i,j N (1)

Fig.1. HMM parameters A, B and in the case of t=1.

Where
a
ij
0, oi]
N
]=1
= 1,for i=1,2,,N (2)

B=[b
j
(o
t
)], b
j
(o
t
)=P(o
t
| q
t
=j), 1 ,j N (3)

= [
i
],
i
=P(q
1
=i), 1 ,i N (4)
where

N
i=1

i
=1 (5)
For illustration purposes, an HMM model and
related parameters A, B, and are shown in Fig. 1.
Given HMM, and observation sequence O={o
1
,
o
2
,, o
T
}, one may estimate the best state sequence
Q*={q
1
, q
2
,,q
T
} based on a dynamic
programming approach so as to maximize P(Q*|O,
l), [ 5]. In order to make Q* meaningful, one has to
well set up the model parameters A, B and . The
Baum-Welch algorithm is the most widely adopted
methodology for model parameters estimation. The
model parameters p
i
, a
ij
, are each characterized as:

=
1
(i) (6)

) (
) , (
1
1
1
1
i
j i
a
T
t
t
T
t
ij

i=1,2,,N (7)

1
1
1
1
) (
) (
) (
T
t
t
T
O
t
t
j
j
j
k b
k
V t
i=1,2,.,N (8)

Where
t
(i) denotes the conditional probability of
being state i at time t, given the observations, and
t(i, j) is the conditional probability of a transition
from state i at time t to state j at time t + 1, given the
observations. Both t(i) and t(i, j) can be solved in
terms of a well-known forward-backward algorithm
[9]. Define the forward probability
t
(i) as the joint
probability of observing the first t observation
sequence O
1
to t={o
1
, o
2
,,o
t
} and being in state i
at time t. The
t
(i) can be solved inductively by the
following formula:

1
(i) =
i
b
i
(o
1
) , 1 i N (9)

t+1
(i) =b(o
t+1
)
N
j=1
[
t
(i)
ij
], For 1 t T, For 1 i
N (10)

Let the backward probability b
t
(i) be the
conditional probability of observing the observation
sequence O
t

to

T
={o
t+1
, o
t+2
,,o
T
} after time t given
that the state at time t is i. As with the forward
probability, the bt(i) can be solved inductively as:

[
T
= 1 , 1 I N (11)

167

t
(i)= o
N
]=1 ij
b
j
(o
t+1
)
t+1
(j), t=T-1 , T-2,...,1, 1 i
N (12)

The probabilities t(i) and t(i, j) are then solved
by:

t
(i) =
ut ()[t()
ut()[t()
N
i=1
(13)

t
(i,j)=
ut()u] b](0t+1)[t+1(])
ut()u] b(0t+1)[t+1(])
N
]=1
N
i=1
(14)

As a result, if both the observation density b
i
(o
t
)
and observation sequence O={o
1
,o
2
,,o
T
} are well
managed, then the hidden state sequence will be
closer to the ideal situation. Moreover, the Viterbi
algorithm is usually employed to perform global
decoding which found the states of each
observation separately. Using the Viterbi algorithm
is aimed to find the most likely sequence of latent
states corresponding to the observed sequence of
data[10]. Also the Viterbi algorithm can be used to
find the single best state sequence of an observation
sequence. The Viterbi algorithm is another trellis
algorithm which is very similar to the forward
algorithm, except that the transition probabilities
are maximized at each step, instead of summed
[11].First define:

t
(i) = max P(q
1
q
2
q
t
= s
i
, o
1
, o
2
o
t
| ) (15)
q
1
,q
2
,,q
t-1

As the probability of the most probable state
path of the partial observation sequence. The
Viterbi algorithm steps can be stated as:

1. Initialization

1
(i) =
i
b
i
(o
1
), 1 i N, (16)

2. Recursion:

N
t
(j)= max [
t-1
(i) a
ij
] b
j
(o
t
), 2 t T,1 j N
(17)
i=1

N
t
(j) = arg max [
t-1
(i) a
ij
] , 2 t T, 1 j N
(18)
i=1

3. Termination:

N
P
*
= max [
T
(i)] (19)
i=1
N
q
*
T
= arg max [
T
(i)] (20)
i=1

When implementing HMM for unsupervised
image classification, the pixel values (or vectors)
correspond to the observations, and after the
estimation for the model parameter is completed,
the hidden state then corresponds to the cluster to
which the pixel belongs. So the first step is set the
model parameters and the second step is applying
the Viterbi algorithm to segmenting the pixels of
the image to several clusters and the last step is for
mapping these clusters to a number of classes
which is determined by the user.
3 METHODOLOGY

Usually, in a traditional k-means, the clustering
scheme uses a spectral data alone in classifying
satellite images, while in this work the spectral and
spatial information have been used to classify
satellite imagery and the method used is an
observation density based method for sequencing
the pixels of the satellite images which take blocks
of pixels (2*2) , (3*3), and cross neighbours so that
this method can be extend the scope of the
observation in terms of combining the pixel with its
neighbouring pixels to form a new observation
vector. So each block of size (2*2), (3*3) and cross
neighbours will represent an observation.
Following such a consideration, to build up an
HMM, the pixel fitting direction aligned to the row
direction (i.e. strip-like) has been maintain, but for
each pixel of interest, the vertical and horizontal
neighbouring pixel(s) are added in order to take the
vertical and horizontal spatial dependencies into
account. In this method (4 pixels ), (9 pixels) and (5
pixels) will contribute to produce a single
observation and as a result made the classification
more accurate by maximizing the probabilities of
the observations (pixels) to which class is belong.
For illustration we use a sample image of ( 5*5 )
pixels to show the method as in Fig. 2.

168


(b)

(c)

(d)

Fig. 2. (a) is a (5*5) sample image, (b) an
observations represented with a block of size
(2*2) neighbourhood pixels, (c)an observations
represented with a block of size (3*3)
neighbourhood pixels, and (d) a cross
neighbourhood pixels observations.

4 THE TEST AREA

Two test areas have been considered, one is
located in the north east of Kirkuk and the other is
located in the south Dohuk in Iraq. Both areas
contains the same classes of land cover which are
(water, vegetation, bare soil, and rocky areas). The
Landsat satellite imagery captured at 2012, the
resolution of the image is (3030) meter which
means that each pixel represents a(3030) meter on
the earth. Satellite imagery can provide a
convenient means of monitoring the evolution of
the area. For experimental purposes, a test area with
(300 300) pixels was extracted from the Landsat
multispectral imagery to facilitate the analysis of
the classification methodology. Each classified
image was then evaluated in terms of its visual
quality. The test area is shown in Fig. 3.

Fig. 3. (a) Landsat multispectral imagery in terms
of false color composite data, in the north of
Iraq,(b) Test area (1), (c) Test area (2).
169

5 THE PROPOSED ALGORITHM
This paper will use unsupervised Hidden Markov
Models (HMM) to classifying the land cover from
multispectral satellite images and propose a new
technique to sequencing the pixels (observations) of
the image so as to fit them into the HMM and to get
the best classification. Fig. 4 shows the flowchart of
this work.
At the first step the test areas have been
determined and selected best bands. Here, band
combinations (band2, band4, band7) for the two test
areas have been chosen. After that the image entered
to the program. The number of classes in the image
was defined by the user, then change the image from
the RGB form into gray scale image. In the next
step, coding of each pixel will be done depending on
the intensity of the pixel.
Convert the image from two dimension
representation into one dimension (a sequence of
pixels or a vector of observations) because the
(HMM) original theory assumes a one-dimensional
linear dependent structure. This research uses a
block of (2*2) neighbours and (3*3) neighbours as
well as the cross neighbours method to utilize the
dependencies between pixels. In this way, the 2D
information can be embedded into HMM, while the
original one-dimensional linear structure of an
HMM is still remained.

Fig. 4. The flowchart showing the steps of the
proposed algorithm.
After the estimation of the model parameters (A
, B , ) and by specifying a threshold to each class,
then Viterbi algorithm for unsupervised image
classification was done to classify each pixel of the
image. The pixel values (or vectors) corresponding
to the observations and the hidden state
corresponding to the cluster to which the pixel
belongs, if the resulted classified image is the best
classification then save it and display the last result
or return to updating the model parameters and
maximizing the probability of the observation
sequence and repeat the procedure until the best
classification resulted then save and display the
resulted image.
6 RESULTS AND DISCUSSION
The schemes of using HMMs to pursue higher
classification accuracy in unsupervised sense as
described earlier have been implemented for the
test imagery. HMM was constructed by each
method according to the parameter estimation
process and then the state for each corresponding
pixel was extracted to form classified imagery. For
all HMMs, the original model parameters, namely,
A, B, and , are randomly assigned [12]. The
iterations of the model parameters estimations are
converged within ten or eleven runs. The hidden
state (i.e. cluster) sequence Q* is then estimated
using Viterbi algorithm (i.e. dynamic
programming) so as to maximize the probability of
sequence of states given the observation and the
model P (Q*|O, ). The resulting clusters are then
mapped to the information classes according to the
knowledge of ground data. The resulting images for
unsupervised classifications using HMM based on
the (2*2), (3*3) and cross neighbourhood
observation sequence of the test area (1) which is
shown in Fig. 5. (a) to ( c ) and for the test area (2)
which is shown in Fig. 6. (a) to ( c ). In both areas,
the three results show that (3*3) neighbourhood
method gives the lowest accuracy classification and
the cross neighbourhood gives the highest accuracy.
In the three results of the first test area there are
three classes ( water, vegetation, bare soil) the
water body is best classified so as the vegetation
areas, there was some confusion between bare soil
and water in the three results, and the less
confusion was in the (2*2) neighbourhood method,
while in the three results of the second test area
there are six classes ( water, vegetation, bare soil,
Rocky lands, High soil moisture, and shades).

170


(a)

(b)

(c)

Fig. 5. The result of classification of the test area (1).
(a)the result of (33)neighbourhood, (b) the result of
(22)neighbourhood, and ( c ) the result of cross
neighbourhood.

(a)

(b)

(c)

Fig. 6. The result of classification of the test area (2).
(a)the result of (3*3)neighbourhood, (b) the result of
(2*2)neighbourhood, and ( c ) the result of cross
neighbourhood.

171

The Figure 6 (a) and (b) Shows the classes
distribution for the two test areas.

(a)

(b)

Fig. 6. Shows the class distribution, (a) class
distribution for test area(1), (b) class distribution
for test area(2)

172

6 CONCLUSION

The schemes of incorporating 2D spatial
information of remotely sensed imagery into a one-
dimensional linear HMM have been proposed and
demonstrated in terms of visual quality through
unsupervised classifications. In this paper the three
proposed methods are applied to the Landsat
imagery which is (30*30)meters of resolution
which is the only available imagery for this work ,
if these methods are applied to higher resolution
remotely sensed imagery the accuracy and the
visual quality will raises to a better results.

7 REFERENCES

[1] Mohamed El Yazid Boudaren, and Abdel
Belaid, A New scheme for land cover
classification in aerial images: combining
extended dependency tree-HMM and
unsupervised segmentation, Lecture Notes in
Electronic Engineering - Electronic
Engineering and Computing Technology
Spriger, inroad-00579704, version 1-24, pp.
471-482, Mar 2011.
[2] Weidong Li and Chuanrong Zhang, A markov
chain geostatistical framework for land-cover
classification with uncertainty assessment
based on expert-interpreted pixel form
remotely sensed imagery, IEEE transactions
on geoscience and remote sensing, Vol. 49, No.
8, 2983-2992, August 2011.
[3] Juliane Huth, Claudia Kuenzer, Thilo
Wehrmann, steffen Gebhardt, Vo Quoc Tuan,
and Stefan Dech, Land cover and land use
classification with TWOPAC: towards
automated processing for pixel- and object-
based image classification, Remote Sens. 4,
doi:10.3390/rs4092530, pp. 2530-2553, 2012.
[4] Qian Wang, Jianping Chen, and Yi Tian,
Remote sensing image interpretation study
serving urban planning based on GIS, The
international Archives of the Photogrammetry,
Remote sensing and Spatial Information
Sciences. Vol. XXXVII. Part B4. Beijing 2008.
[5] Koray Kayabol, and Josiane Zerubia,
Unsupervised mplitude and texture based
classification of SAR images with multinomial
latent model, Institute National de Recharge
en Information en Automatique. Hal-
00612491, version 2-2 May 2012.
[6] Tso B. , and Olsen R. C., Combining spectral
and information into hidden markov models for
unsupervised image classification,
International Journal of Remote Sensing. Vol.
26, No. 10, pp. 2113-2133, 20 May 2005.
[7] Roger Fjrtoft, Jean-Marc Boucher, Yves
Delignon, Rene Garello, Jean-Marc Le Caillec,
Henri Matre, Jean-Marie Nicolas, Wojciech
Pieczynski, Marc Sigelle, and Florence
Tupin,Unsupervised classification of radar
images based on hidden markov models and
generalised mixture estimation, Proc. SAR
Image Analysis Modeling, and Techniques V,
Vol.SPIE 4173, Barcelona, Spain, September
2000.
[8] Wleed Abdulla, and Nikola Kasabov, The
concept of hidden markov model in speech
recognition, The Information Science
Discussion Paper Series, No. 99/09, ISSN
1177-45X, , May 1999.
[9] Leonard E. Baum, Ted Petrie, George Soules,
and Norman Weiss, A Maximization
technique occurring in the statistical analysis of
probabilistic functions of markove chains,
The Annals of Mathematical statistics, Vol. 41,
No. 1, pp. 164-171, 1970.
[10] Silvia Pandolfi, and Francesco Bartolucci, A
new constant memory recursion for hidden
markov models, FIRB (Future in ricerca
2012), Perugia (IT), March 15-16, 2013.
[11] Daniel Jurafsky, and Jamse H. Martin, Speech
and language processing: An introduction to
natural language processing, computational
linguistics and speech recognition, 2nd Ed.,
prentice-Hall 2000, ISBN: 0-13-095069-6.
[12] Lawrence R. Rabiner, A Tutorial on hidden
markov models and selected applications in
speech recognition, Proceedings of the IEEE,
Vol. 77, No. 2,February 1989.

Land Cover Classification Using Hidden Markov Models

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Land Cover Classification Using Hidden Markov Models

Uploaded by

Copyright:

Available Formats

N

You might also like