Professional Documents
Culture Documents
Processing (IJIP)
Edited By
Computer Science Journals
www.cscjournals.org
Editor in Chief Professor Hu, Yu-Chen
This work is subjected to copyright. All rights are reserved whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting,
re-use of illusions, recitation, broadcasting, reproduction on microfilms or in any
other way, and storage in data banks. Duplication of this publication of parts
thereof is permitted only under the provision of the copyright law 1965, in its
current version, and permission of use must always be obtained from CSC
Publishers. Violations are liable to prosecution under the copyright law.
©IJIP Journal
Published in Malaysia
CSC Publishers
Editorial Preface
Highly professional scholars give their efforts, valuable time, expertise and
motivation to IJIP as Editorial board members. All submissions are evaluated
by the International Editorial Board. The International Editorial Board ensures
that significant developments in image processing from around the world are
reflected in the IJIP publications.
IJIP editors understand that how much it is important for authors and
researchers to have their work published with a minimum delay after
submission of their papers. They also strongly believe that the direct
communication between the editors and authors are important for the
welfare, quality and wellbeing of the Journal and its readers. Therefore, all
activities from paper submission to paper publication are controlled through
electronic systems that include electronic submission, editorial panel and
review system that ensures rapid decision with least delays in the publication
processes.
To build its international reputation, we are disseminating the publication
information through Google Books, Google Scholar, Directory of Open Access
Journals (DOAJ), Open J Gate, ScientificCommons, Docstoc and many more.
Our International Editors are working on establishing ISI listing and a good
impact factor for IJIP. We would like to remind you that the success of our
journal depends directly on the number of quality articles submitted for
review. Accordingly, we would like to request your participation by
submitting quality manuscripts for review and encouraging your colleagues to
submit quality manuscripts for review. One of the great benefits we can
provide to our prospective authors is the mentoring nature of our review
process. IJIP provides authors with high quality, helpful reviews that are
shaped to assist authors in improving their manuscripts.
Editor-in-Chief (EiC)
Professor Hu, Yu-Chen
Providence University (Taiwan)
Pages
89– 105 Determining the Efficient Subband Coefficients of Biorthogonal
Wavelet for Gray level Image Watermarking
Nagaraj V. Dharwadkar, B. B. Amberker
106 - 118
A Novel Multiple License Plate Extraction Technique for Complex
Background in Indian Traffic Conditions
Chirag N. Paunwala
156 - 163 Contour Line Tracing Algorithm for Digital Topographic Maps
Ratika Pradhan, Ruchika Agarwal, Shikhar Kumar, Mohan P.
Pradhan, M.K. Ghose
164- 174 Automatic Extraction of Open Space Area from High Resolution
Urban Satellite Imagery
Hiremath P. S, Kodge B. G
B. B. Amberker bba@nitw.ac.in
Professor, Department of Computer
Science and Engineering
National Institute of Technology (NIT)
Warangal, (A.P), INDIA
Abstract
In this paper, we propose an invisible blind watermarking scheme for the gray-
level images. The cover image is decomposed using the Discrete Wavelet
Transform with Biorthogonal wavelet filters and the watermark is embedded into
significant coefficients of the transformation. The Biorthogonal wavelet is used
because it has the property of perfect reconstruction and smoothness. The
proposed scheme embeds a monochrome watermark into a gray-level image. In
the embedding process, we use a localized decomposition, means that the
second level decomposition is performed on the detail sub-band resulting from
the first level decomposition. The image is decomposed into first level and for
second level decomposition we consider Horizontal, vertical and diagonal
subband separately. From this second level decomposition we take the
respective Horizontal, vertical and diagonal coefficients for embedding the
watermark. The robustness of the scheme is tested by considering the different
types of image processing attacks like blurring, cropping, sharpening, Gaussian
filtering and salt and pepper noise effect. The experimental result shows that the
embedding watermark into diagonal subband coefficients is robust against
different types of attacks.
1. INTRODUCTION
The digitized media content is becoming more and more important. However, due to the
popularity of the Internet and characteristics of digital signals, circumstantial problems are also on
the rise. The rapid growth of digital imagery coupled with the ease by which digital information
can be duplicated and distributed has led to the need for effective copyright protection tools. From
this point of view, digital watermark is a promising technique to protect data from illicit copying
[1][2]. The classification of watermarking algorithm is done on several view points. One of the
viewpoints is based on usage of cover image to decode the watermark, which is known as Non-
blind or private [3], if cover image is not used to decode the watermark bits that are known as
Blind or public watermarking algorithm [4]. Another view point is based on processing domain
spatial domain or frequency. Many techniques have been proposed in the spatial domain, such
as the LSB (least significant bit) insertion [5][6], these schemes usually have features of small
computation and large hidden information, but the drawback is with weak in robustness. The
others are based on the transformation techniques, such as, based on DCT domain, DFT domain
and DWT domain etc. The latter becomes more popular due to the natural framework for
incorporating perceptual knowledge into the embedded algorithm with conducive to achieve
better perceptual quality and robustness [7].
Recently the Discrete Wavelet Transformation gained popularity since the property of multi-
resolution analysis that it provides. There are two types of wavelets; Wavelets can be orthogonal
(orthonormal) or Biorthogonal. Most of the wavelets used in watermarking were orthogonal
wavelets. The scheme in [8] introduces a semi-fragile watermarking technique that uses
orthogonal wavelets. Very few watermarking algorithms used Biorthogonal wavelets. The
Biorthogonal wavelet transform is an invertible transform. It has some favorable properties over
the orthogonal wavelet transform, mainly, the property of perfect reconstruction and smoothness.
Kundur and Hatzinakos [9] suggested a non-blind watermarking model using Biorthogonal
wavelets based on embedding a watermark in detail wavelet coefficients of the host image. The
results showed that the model was robust against numerous signal distortions, but it is non-blind
watermarking algorithm that required the presence of the watermark at the detection and
extraction phases.
One of the main differences of our technique than other wavelet watermarking scheme is in
decomposing the host image. Our scheme decomposes the image using first level Biorthogonal
wavelet then obtains the detail information of sub-band (LH or HL or HH) of it to be further
decomposed as in [12], except we are directly embedding watermark bits by changing the
frequency coefficients of subbands. Here we are not using pseudo random number sequence to
represent watermark, directly the frequency coefficients are modified by multiplying with bits of
watermark. In extraction algorithm we don’t need cover image its blind watermarking algorithm.
The watermark is extracted by scanning the modified frequency coefficients. We evaluated
essential elements of a proposed method, i.e. robustness and imperceptible under different
embedding strengths. Robustness refers to the ability to survive intentional attacks as well as
accidental modifications, for instance we took Blurring, noise insertion, region cropping, and
sharpening as a intentional attacks. Imperceptibility or fidelity means the perceptual similarity
between the watermarked image and its cover image using Entropy, Standard Deviation, RMS,
MSE and PSNR parameters.
The DWT (Discrete Wavelet Transform) transforms discrete signal from time domain into time-
frequency domain. The transformation product is set of coefficients organized in the way that
enables not only spectrum analyses of the signal, but also spectral behavior of the signal in time.
Wavelets have the property of smoothness [10]. Such properties are available in both orthogonal
and Biorthogonal wavelets. However, there are special properties that are not available in the
orthogonal wavelets, but exist in Biorthogonal wavelets, that are the property of exact
reconstruction and symmetry. Another advantageous property of Biorthogonal over orthogonal
wavelets is that they have higher embedding capacity if they are used to decompose the image
into different channels. All these properties make Biorthogonal wavelets promising in the
watermarking domain [11].
Let (L, R) be the wavelet matrix pair of rank m and genus g and let f : Z mZ C be any
discrete function. Then
m 1
f (n) C rk a ' rn mk (1)
r 0 k Z
With
r
f ( n) a
n Z
n mk
C k
m
(2)
m 1
r r
( f (n)a
r0 k Z nZ
n mk
)a 'n mk
f ( n) (3)
m
r r
We call L a n mk
the analysis matrix of the wavelet matrix pair and R a' n mk
is the
synthesis matrix of wavelet matrix pair, and they can also be referred to simply left and right
matrices in the pairing (L, R) The terminology refers to the fact that the left matrix in the above
equation is used for analyzing the function in terms of wavelet coefficients and the right matrix is
used for reconstructing or synthesizing the function as the linear combination of vectors formed
from its coefficients. This is simply a convention, as a role of matrices can be interchanged, but in
practice it can be quite useful convention. For instance, certain analysis wavelet functions can be
chosen to be less smooth than the corresponding synthesis functions, and this trade-off is useful
in certain contexts.
m 1
f (n) C rk a rn mk (4)
r 0 k Z
Equation (4) is its expansion relative to wavelet matrix A, then the formula
2 2
n
n
r k
c rk (5)
is valid. This equation describes "energy" represented by function f is partitioned among the
r
orthonormal basis functions ( a * ml
) For wavelet matrix pairs the formula that describes the
partition of energy is more complicated, since the expansion of both the L-basis and R-basis are
involved. The corresponding formula is
2
r r
n
n
c k c 'k
r k
(6)
Where
'r
f (n) a '
n Z
n mk
C k
m
(7)
r r
Let L ( a ), k
R (a 'k ) be a wavelet matrix pair, then the compactly supported functions in
2
L (Z ) of the form
r r
{ , ', , ' , r 1,..., m 1} (8)
0
( x ) a k (mx k ) (9)
k
r r
( x) a (mx k ), r 1,..., m 1
k
(10)
k
0
'( x) a 'k '(mx k ) (11)
k
r r
' ( x) a ' '(mx k ), r 1,..., m 1
k
(12)
k
r r
{ ( x ), '( x )} are called Biorthogonal scaling functions and { , ' , r 1,..., m 1}
r
Biorthogonal wavelet functions, respectively. We call functions { , } the analysis functions and
r
the function { ', ' } the synthesis functions. Using the rescaling and translates of these
functions we have general Biorthogonal wavelet system associated with wavelet matrix pair (L, R)
of the form.
r
k ( x ), jk ( x ), r 1,..., m 1 (13)
r
' k ( x), ' jk ( x), r 1,..., m 1 (14)
3. PROPOSED MODEL
In this section, we give a description about the proposed models used to embed and extract the
watermark for gray-level image. The image is decomposed using Biorthogonal wavelet filters.
Biorthogonal Wavelet coefficients are used in order to make the technique robust against several
attacks, and preserve imperceptibility. The embedding algorithm and extraction algorithm for gray
level images is explained in the following sections.
The embedding algorithm uses monochrome image as watermark and gray-level image as cover
image. The first level Biorthogonal wavelet is applied on the cover image, then for second level
decomposition we consider HL (Horizontal subband), LH (vertical subband), and HH (diagonal
subband) separately. From these second level subbands we take LH, HL and HH respective
subbands to embed the watermark. Figure 1 shows the flow of embedding algorithm.
Cover Image
.
First Level
DWT
LH1
Second Level
DWT
LH2
watermark image
Embedding
watermarked image
Apply Twice
Inverse DWT
Watermarked Image
1. Apply First level Biorthogonal Wavelet on input gray level cover image to get {LH1, HL1,
HH1 and LL1} subbands as shown in Figure 2.
2. From decomposed image of step 1 take the vertical subcomponent LH1 where the size of
LH1 is m / 2 m / 2 for LH1 again apply first level Biorthogonal wavelet and get vertical
subcomponent LH2 (as shown in Figure 3.), Where the size of LH2 is m / 4 m / 4 and in
LH2 subband we found frequency coefficients values are zero or less than zero.
3. Embed the watermark into the frequency coefficient of LH2 by scanning frequency
coefficients row by row, using following formula Y ' (| Y | )W (i, j ) , Where =
0.009, Y is original frequency coefficient of LH2 subband, if watermark bit is zero then Y '
= 0 else Y’ > 1.
4. Apply inverse Biorthogonal Wavelet transformation two times to obtain watermarked gray
level image.
5. Similarly the watermark is embedded separately into the HL (Horizontal) and HH
(diagonal) subband frequency coefficients.
LL1 HL1
LH1
HH1
LL1 HL1
LL2 HH2
HH1
LH2
HH2
Cover Image
First Level
DWT
LH1
Second Level
DWT
LH2
watermark image
Extraction
Output : Watermark.
1. Apply First level Biorthogonal Wavelet on watermarked gray level cover image to get
{LH1, HL1, HH1 and LL1} subbands.
2. From decomposed image of step 1 take the vertical subcomponent LH1 where the size of
LH1 is m / 2 m / 2 for LH1 again apply first level Biorthogonal wavelet and get vertical
subcomponent LH2 (as shown in Figure 3.), Where the size of LH2 is m / 4 m / 4 .
3. From the subband LH2 extract the watermark by scanning frequency coefficients row by
row. If frequency coefficient is greater than zero set watermark bit as 1 else set 0.
4. Similarly the watermark is extracted from the HL (Horizontal) and HH (diagonal) subband
frequency coefficients.
Here, I(i, j) is original watermark, J(i, j) is extracted watermark, I' is the mean of original
watermark and J’ is mean of extracted watermark.
2. Normalized Correlation (NC) : It measure the similarity representation between the
original image and modified image.
M N
I ( i , j ) I '( i , j )
i1 j 1
N C 2
M N
I (i, j )
i1 j1
Where I (i, j) is original image and I’(i, j) is modified image, M is Height of image and N is
width of image
3. Mean Square Error (MSE): It measures the average of the square of the "error." The
error is the amount by which the pixel value of original image differs to the pixel value of
modified image.
M N 2
[ f (i , j ) f '( i , j )]
i 1 j 1
M SE
MN
Where, M and N are the height and width of image respectively. f(i, j) is the (i, j)th pixel
value of the original image and f ′( i, j) is the (i, j)th pixel value of modified image.
4. Peak signal to noise ratio (PSNR): It is the ratio between the maximum possible
power of a signal and the power of corrupting noise that affects the fidelity of its
representation. PSNR is usually expressed in terms of the logarithmic decibel. PSNR is
given by.
n 2
(2 1)
PSNR 10log
MSE
4.1 Measuring Perceptual quality of watermarked image
In this section we discuss the effect of embedding algorithm on cover image in terms of
perceptual similarity between the original image and watermarked image using Mean,
Standard Deviation, RMS and Entropy. The effect of extraction algorithm is calculated using
MSE, PSNR, NC and SC between extracted and original watermark. As shown in Figure 5,
watermark is embedded by decomposing LH1, HL1, HH1 separately further in second level
and the quality of original gray scale image and watermarked image are compared. The
parameters such as Mean, Standard Deviation, RMS and Entropy are calculated between the
original gray level image and watermarked image. The results shows that there is only slight
variation exist in above mentioned parameters. This indicates that the embedding algorithm
will modify the content of original image by negligible amount. The amount of noise added to
gray-level cover image is calculated by using MSE and PSNR. Thus the results from the
experiments indicates that the embedding watermark into HH (diagonal) subband produces
the better results in terms of MSE and high PSNR compared to other subbands.
lena256.bmp
Original watermark
Extracted watermark
FIGURE 6: Effect of Extraction algorithm from LH, HL and HH subband coefficients on Watermark
Figure 6 shows the results of Watermark extraction by decomposing LH1, HL1, HH1 separately
further in second level and the quality of Extracted watermark and original watermark are
compared. The parameters such as MSE, PSNR, NC and SC are calculated between the
extracted and original watermark. The results show that the extraction algorithm produces similar
results for all subbands in terms above mentioned parameters.
In this section we discuss about the performance of extraction algorithm by considering different
types of image processing attacks on watermarked gray-level image such as blurring, adding salt
and pepper noise, sharpening, Gaussian filtering and cropping.
1. Effect of Blurring: Special type of circular averaging filter is applied on the watermarked
gray-level image to analyze the effect of Blurring. The circular averaging (pillbox) filter filters
the watermarked image within the square matrix of side 2 (Disk_ Radius) +1. The disk
radius is varied from 0.5 to 1.4 and the effect of blurring is analyzed on extraction algorithm.
Figure 7 shows the extracted watermark for different disk radius of LH, HL and HH subbands.
Figure 8 shows the effect of blurring on watermarked image in terms of MSE, NC, SC and
PSNR between original and extracted watermark. From the experimental results it was found
that the extraction of watermark from HH subband produces NC is equal 1 for disk radius up
to 1.4. The extracted watermark is highly correlated with original watermark, when the
watermark is embedded into HH subbands. Figure 8 shows the effect of Blurring in terms of,
NC and SC between original and extracted watermark and MSE, PSNR between original and
watermarked image.
DISK Extracted Watermark from Extracted Watermark from Extracted Watermark from
RADIUS LH Subband HL Subband HH Subband
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
FIGURE 7: Extracted watermark from Blurred watermarked Gray-level images using LH, HL
and HH subbands
0.20 1.01
0.18
0.16 1
Normalized Correlation
0.14 0.99
0.12 LH LH
0.98
MSE
0.10 HL HL
0.08 HH 0.97
HH
0.06 0.96
0.04
0.95
0.02
0.00 0.94
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Disk radius Disk Radius
(a) MSE between Original and Extracted Watermark (b) NC between Original and Extracted Watermark
0.7 58
0.6 57.5
Standard Correlation
0.5 57
LH LH
0.4 56.5
PSNR
HL HL
0.3 56
HH HH
0.2 55.5
55
0.1
54.5
0
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Disk Radius
Disk Radius
(c) SC between Original and Extracted Watermark (d ) PSNR between Original and Extracted Watermark
FIGURE 8: Effect of Blurring on watermarked Grayscale image
2. Effect of adding salt and pepper noise: The salt and pepper noise is added to the
watermarked image I, where d is the noise density. This affects approximately d
(size(I)) pixels. Figure 9 shows the extracted watermarks from LH, HL and HH subbands
for noise density varied from 0.001 to 0.007. Figure. 10 show the effect of salt and pepper
noise on extraction algorithm. From the experimental results, it was found that extraction
of watermark from HH subband is producing NC equal to 0.95. Thus embedding
watermark into HH subbands is robust against adding salt and pepper noise.
0.001
0.002
0.003
0.004
0.005
0.006
0.007
FIGURE 9: Extracted watermark from Salt and Pepper Noise added to watermarked images using LH, HL
and HH subbands
0.25 1.05
Normalized Correlation
0.2 1.00
0.95
0.15 LH LH
MSE
HL 0.90 HL
0.1 HH HH
0.85
0.05
0.80
0 0.75
0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.001 0.002 0.003 0.004 0.005 0.006 0.007
(a) MSE between Original and Extracted Watermark (b) NC between Original and Extracted Watermark
0.70 57.5
57
0.60
Standard Correlation
56.5
0.50
56
LH LH
PSNR
0.40 55.5
HL HL
0.30 55
HH HH
54.5
0.20
54
0.10
53.5
0.00 53
0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.001 0.002 0.003 0.004 0.005 0.006 0.007
Density Density
(c) SC between Original and Extracted Watermark (d) PSNR between Original and Extracted Watermark
FIGURE 10: Effect of Salt and Pepper Noise on watermarked Grayscale image
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
FIGURE 11: Extracted watermark from sharpened watermarked images using LH,
HL and HH subbands
0.14 1.00
Normalized Correlation
0.12 0.99
0.1 0.98
LH LH
0.08
MSE
HL 0.97 HL
0.06 HH
HH 0.96
0.04
0.95
0.02
0 0.94
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(a) MSE between Original and Extracted Watermark (b) NC between Original and Extracted
Watermark
0.90
59.5
0.80
Standard Coefficient
59
0.70
58.5
0.60
LH 58 LH
0.50
PSNR
HL 57.5 HL
0.40
HH 57 HH
0.30
56.5
0.20
0.10 56
0.00 55.5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Sharpne ss
Sharpness
(c) SC between Original and Extracted Watermark (d) PSNR between Original and Extracted
Watermark
FIGURE 12: Effect of Sharpening on watermarked Gray level image
Percentage of Cropped image and Extracted Cropped image and Extracted Cropped image and Extracted
cropping Watermark from LH Subband Watermark from HL Subband Watermark from HH Subband
out filenam e.bmp out filenam e.bmp outfilename.bmp
10
20
30
40
50
60
70
80
90
FIGURE 13: Extracted watermark from Cropped watermarked images using LH, HL and HH
subbands
0.30 1.02
1.00
0.25
Normalized Correlation
0.98
0.20 0.96
LH 0.94 LH
MSE
0.15 HL 0.92 HL
HH 0.90 HH
0.10
0.88
0.05 0.86
0.84
0.00 0.82
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Percentage of Cropping Perce ntage of Cropping
(a) MSE between Original and Extracted Watermark (b) NC between Original and Extracted Watermark
58
0.70
0.60 57
Standard Correlation
0.50 56
Lh LH
PSNR
0.40
HL 55 HL
0.30 HH HH
54
0.20
53
0.10
0.00 52
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Percentage of Cropping Percentage of Cropping
(c) SC between Original and Extracted Watermark (d) PSNR between Original and Extracted Watermark
FIGURE 14: Effect of Cropping on watermarked Gray-level image
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.1
0.20 58
0.18
57.5
0.16
0.14 57
0.12 LH LH
56.5
PSNR
MSE
0.10 HL HL
0.08 HH 56
HH
0.06 55.5
0.04
0.02 55
0.00 54.5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Sigma Sigma
(a) MSE between Original and Extracted Watermark (b) NC between Original and Extracted Watermark
1.01 0.70
1 0.60
Normalized Correlation
Standard Correlation
0.99 0.50
LH LH
0.98 0.40
HL HL
0.97 0.30
HH HH
0.96 0.20
0.95 0.10
0.94 0.00
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Sigma Sigma
(c) SC between Original and Extracted Watermark (d) PSNR between Original and Extracted Watermark
5. COMPARISON
We compare the performance of our algorithm with the other watermarking algorithms based on
Birorthogonal wavelet Transform. The transform uses a localized decomposition, meaning that
the second level decomposition is performed on the detail sub-band resulting from the first level
decomposition proposed by Suhad Hajjara [12]. The comparison is decided in Table 1. In
proposed algorithm the watermark is embedded directly into the frequency coefficients. The
robustness of algorithm is analyzed separately for HL, LH and HH subband coefficients.
Suhad Hajjara
Properties Proposed Algorithm
[12]
Gray level
Cover Data Gray-level
Binary Image
Watermark mapped to Monochrome image
Pseudo random (logo)
Number( PRN)
Frequency
Domain of embedding Frequency Domain
Domain
DWT based
Types of Filters DWT based Biorthogonal
Biorthogonal
Diagonal
Frequency bands
(HH),Vertical Diagonal (HH),Vertical
considered for
(LH) and (LH) and Horizontal (HL)
embedding
Horizontal (HL)
TABLE 1: Comparison of proposed algorithm with Suhad Hajjara proposed algorithm [12].
6. CONSLUSION
In this paper we proposed a novel scheme of embedding watermark into gray-level image. The
scheme is based on decomposing an image using the Discrete Wavelet Transform using
Biorthogonal wavelet filters and the watermark bits are embedded into significant coefficients of
the transform. We use a localized decomposition, meaning that the second level decomposition is
performed on the detail sub-band resulting from the first level decomposition. For gray-scale
image for embedding and extraction we defined separate modules for LH, HL and HH subbands,
then the performance of these modules are analyzed by considering normal watermarked image
and signal processed (attacked) images. In all these analysis we found that HH (diagonal)
subband embedding and extraction produces the good results in terms of attacked and normal
images.
7. REFERENCES
1. Ingemar J. Cox and Matt L. Miller, “The First 50 Years of Electronic Watermarking”,
EURASIP Journal on Applied Signal Processing Vol. 2, pp. 126–132, 2002.
2. G. Voyatzis, I. Pitas, “ Protecting digital image copyrights: A framework”, IEEE Computer
Graphics Application, Vol. 19, pp. 18-23, Jan. 1999.
3. Katzenbeisser S. and Petitcolas F. A. P., “Information Hiding Techniques for Steganography
and Digital Watermarking”, Artech House, UK, 2000.
4. Peter H. W. Wong, Oscar C. Au, Y. M. Yeung, “A Novel Blind Multiple Watermarking
Technique for Images”, IEEE Transactions on Circuits and Systems for Video Technology,
Vol. 13, No. 8, August 2003.
5. Celik, M.U., et al., “Lossless generalized-LSB data embedding”, IEEE Transactions on Image
Processing,, 14(2), pp.253-26, .2005.
6. Cvejic, N. and T. Seppanen, “Increasing robustness of LSB audio steganography by reduced
distortion LSB coding”. Journal of Universal Computer Science, 11(1), pp. 56-65, 2005.
7. Ingemar J. Cox, Matthew L Miller, Jeffrey A. Bloom, Jassica Fridrich, Tan Kalker, “ Digital
Watermarking and Steganography”, Second edition, M.K. Publishers, 2008.
8. Wu, X., J., Hu, Z.Gu, and J., Huang, 2005. “A Secure Semi-Fragile Watermarking for Image
Authentication Based on Integer Wavelet Transform with Parameters”, Technical Report.
School of Information Science and Technology, Sun Yat-Sen University, China, 2005
9. Kundur, D., and D., Hatzinakos, 1998. “Digital watermarking using multiresolution wavelet
decomposition”, Technical Report., Dept. of Electrical and Computer Engineering, University
of Toronto
10. Burrus, C. S., R. A., Gopinath, and H., Guo,. “Introduction to Wavelets and Wavelet
Transforms: A Primer”, Prentice-Hall, Inc. 1998.
11. Daubechies, I., 1994. “Ten lectures on wavelets”, CBMS, SIAM, pp 271-280.
12. Suhad Hajjara, Moussa Abdallah, Amjad Hudaib, “Digital Image Watermarking Using
Localized Biorthogonal Wavelets”, European Journal of Scientific Research, ISSN 1450-216X
Vol.26 No.4 (2009), pp.594-608 © EuroJournals Publishing, Inc. 2009.
Abstract
Keywords: License plate recognition, sigmoid function, Horizontal projection, Mathematical morphology,
Aspect ratio analysis, Plate compatible filter.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 106
Chirag N. Paunwala & Suprava Patnaik
1. INTRODUCTION
License plate recognition (LPR) applies image processing and character recognition technology
to identify vehicles by automatically reading their license plates. Automated license plate reading
is a particularly useful and practical approach because, apart from the existing and legally
required license plate, it assumes no additional means of vehicle identity. Although human
observation seems the easiest way to read vehicle license plate, the reading error due to
tiredness is main drawback for manual systems. This is the main motivation for research in area
of automatic license plate recognition. Since there are problems such as poor image quality,
image perspective distortion, other disturbance characters or reflection on vehicle surface, and
the color similarity between the license plate and background vehicle body, the license plate is
often difficult to be located accurately and efficiently. Security control of restricted areas, traffic
law enforcements, surveillance systems, toll collection and parking management systems are
some applications for a license plate recognition system.
Main goal of this research paper is to implement a method efficient in recognizing license plates
in Indian conditions because in Indian scenario vehicles carry extra information such as owner’s
name, symbols, design along with different standardization of license plate. Our work is not
restricted to car but is expanded to many types of vehicles like motor cycle (in which size of
license plate is small), transport vehicles which carry extra text and soiled license plate. Our
proposed algorithm is robust to detect vehicle license plate in both day and night conditions as
well as multiple license plates contained in an image or frame without finding candidate region.
The flow of paper is as follows: section 2 discusses about the previous works in the field of LPR.
Section 3 is about the implementation of algorithm. Section 4 talks about the experimentation
results of the proposed algorithm. Section 5 and 6 are about conclusion and references.
2. PREVIOUS WORK
Techniques based upon combinations of edge statistics and mathematical morphology [1]–[4]
featured very good results. A disadvantage is that edge based methods alone can hardly be
applied to complex images, since they are too sensitive to unwanted edges, which may also show
a high edge magnitude or variance (e.g., the radiator region in the front view of the vehicle).
When combined with morphological steps that eliminate unwanted edges in the processed
images, the LP extraction rate becomes relatively high and fast. In [1], the conceptual model
underneath the algorithm is based on the morphological operation called “top-hat transformation”,
which is able to locate small objects of significantly different brightness [5]. This algorithm,
however, with a detection rate of 80%, is highly dependent on the distance between the camera
and the vehicle, as the morphological operations relate to the dimensions of the binary objects.
The similar approach was described in [2] with some modifications and achieved an accuracy
around 93%. In [3], candidate region was extracted with the combination of edge statistics and
top hat transformations and final extraction was achieved using wavelet analysis, with the
success rate of 98.61%. In [4], a hybrid license plate detection algorithm from complex
background based on histogramming and mathematical morphology was undergone which
consists of vertical gradient analysis and its horizontal projection for finding out candidate region;
horizontal gradient, its vertical projection and morphological deal of candidate region is used to
extract exact license plate (LP) location. In [6], a hybrid algorithm based on edge statistics and
morphology is proposed which uses vertical edge detection, edge statistical analysis,
hierarchical-based LP location, and morphology for extracting the license plate. This prior
knowledge based algorithm achieves very good detection rate for image acquired from a fixed
distance and angle, and therefore, candidate regions in a specific position are given priority,
which certainly boost the results to a high level of accuracy. But it will not work on frames with
plates of different size and license plate more in number. In [7][8], technique was used that scans
and labels pixels into components based on pixel connectivity. Then after with the help of some
measurement features used to detect the region of interest. In [9] the vehicle image was scanned
with pre-defined row distance. If the number of the edges is greater than a threshold value, the
presence of a plate can be assumed.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 107
Chirag N. Paunwala & Suprava Patnaik
In [10], a block based recognition system is proposed to extract and recognize license plates of
motorcycles and vehicles on highways only. In the first stage, a block-difference method was
used to detect moving objects. According to the variance and the similarity of the MxN blocks
defined on two diagonal lines, the blocks are categorized into three classes: low-contrast,
stationary and moving blocks. In the second stage, a screening method based on the projection
of edge magnitudes is used to find two peaks in the projection histograms to find license plates.
But main shortcoming of this method is detection of false region or unwanted non text region
because of projection of edges. In [11], a method using the statistics like mean and variance for
two sliding concentric windows (SCW) was used as shown in Figure (1). This method encounters
a problem when the borders of the license plate do not exhibit much variation from the
surrounding pixels, same as edge based methods. Also, edge detection uses a threshold that
needs to be determined which cannot be uniquely obtained under various conditions like
illuminations. Same authors report a success rate of 96.5% for plate localization with proper
parameterization of the method in conjunction with CCA measurements and the Sauvola
binarization method [12].
(a) (b)
FIGURE 1: (a) SCW Method, (b) Resulting Image after SCW Execution [11].
In Hough transform (HT) based method for license plate extraction, edges in the input image
are detected first. Then, HT is applied to detect the LP regions. In [13], a combination of Hough
transform and contour algorithm was applied on the edge image. Then the lines that cross the
plate frame were determined and a rectangular-shaped object that matched the license plate
was extracted. In [14] scan and check algorithm was used followed by radon transform for skew
correction. In [15] proposed method applies HL subband feature of 2D Discrete Wavelet
Transform (DWT) twice to significantly highlight the vertical edges of license plates and suppress
the surrounding background noise. Then, several promising candidates of license plates can
easily be extracted by first-order local recursive Otsu segmentation [16] and orthogonal
projection histogram analysis. Finally, the most probable candidate was selected by edge
density verification and aspect ratio constraint.
In [17,18], color of the plate was used as a feature, the image was fed to a color filter, and the
output was tested in terms of whether the candidate area had the plate’s shape or not. In [19,
20] the technique based on mean-shift estimate of the gradient of a density function and the
associated iterative procedure of mode seeking was presented and based on the same, authors
of [21] applied a mean-shift procedure for color segmentation of the vehicle images to directly
obtain candidate regions that may include LP regions. In [22], concept of enhancing the low
resolution image was used for better extraction of characters.
None of the above discussed algorithms focused on multiple plate extraction with different
possible aspect ratio.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 108
Chirag N. Paunwala & Suprava Patnaik
Input Image
No Contrast Enhancement
Is Variance
>Threshold using Sigmoid Function
Yes
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 109
Chirag N. Paunwala & Suprava Patnaik
3.1 Preprocessing
This work aims on gray intensity based license plate extraction and hence begins with color to
gay conversion using (1).
where, I(i,j) is the array of gray image, A(i,j,1), A(i,j,2), A(i,j,3) are the R,G,B value of original
image respectively. For accurate location of the license plate the vehicle must be perfectly visible
irrespective of whether the image is captured during day or night or non homogenous illumination.
Sometimes the image may be too dark, contain blur, thereby making the task of extracting the
license plate difficult. In order to recognize the license plate even in night condition, contrast
enhancement is important before further processing. One of the important statistical parameter
which provides information about the visual properties of the image is variance. Based on this
parameter, condition for contrast enhancement is employed. First of all variance of the image is
computed. With an aim to reduce computationally complexity the proposed implementation
begins with the thresholding of variance as a selection criterion for frames aspiring contrast
enhancement. If the value is greater than the threshold then it implies that the corresponding
image possesses good contrast. While if the variance is below threshold, then the image is
considered to have low contrast and therefore contrast enhancement is applied to it. This method
of contrast enhancement based on variance helps the system to automatically recognize whether
the image is taken in daylight or in night condition.
In this work, first step towards contrast enhancement is to apply unsharp masking on original
image and then applying the sigmoid function for contrast enhancement. Sigmoid function which
is also known as logistic function is a continuous nonlinear activation function. The name,
sigmoid, obtained from the fact that the function is "S" shaped. The sigmoid has the property of
being similar to the step function, but with the addition of a region of uncertainty [23]. It is a range
mapping approach with soft thresholding. Using f(x) for input, and with α as a gain term, the
sigmoid function is given by:
1
f ( x) (2)
1 e x
For faultless license plate extraction, identification of edges is very important as license plate
region consists of edges of definite size and shape. In blurry images identification of edges are
indecent, so for the same sharpening of edges are must. By using the unsharp masking,
sharpening of areas which have edges or lots of details can be easily highlighted. This can be
done by generating the blurred copy of the original image by using laplacian filter and then
subtracting it from the original image as shown in (3).
I (i , j ) I (i , j ) I (i , j ) (3)
sharpe original blur
The resultant image, obtained from (3) is then multiplied with some constant c and then added it
to the original image as shown in (4). This step highlights or enhances the finer details but at the
same time larger details will remain undamaged. The value of c chosen is 0.7 from
experimentaiton.
In the next step, smoothing average window size of MxM is apply on the output image obtain from
(4). Since we are going for edge detection, value of M is equal to 3. After that finding out the
mean at each location, it is compared with some pre defined threshold t. If the value of pixel at
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 110
Chirag N. Paunwala & Suprava Patnaik
that location is higher than predefined threshold it remains unchanged else that pixel value will be
change by using sigmoid function of (2).
p if p t
I (i , j )
enhance
b (5)
p (1 e p ) if p t
Where p is the pixel value of enhanced image I(i,j). Here value of b, which determines the degree
of contrast needed, varies in the range of 1.2 to 2.6 based on experimentation. Figure (3) shows
the results of contrast enhancement using sigmoid function. As shown in Figure, after applying
the contrast enhancement algorithm details can be easily viewed from the given input image.
FIGURE 3: Original Low Contrast Image and Enhanced Image using Sigmoid Function.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 111
Chirag N. Paunwala & Suprava Patnaik
determined which represents the position of license plate region. So we can roughly locate the
horizontal position candidate of license plate from the gradient value using (6).
g v (i , j ) f (i , j 1) f (i , j ) (6)
Figure 4 shows the original gray scale image and the image after finding out vertical edges from
the original.
A o B AB B (7)
A B A B B (8)
In general scenario, license plate is white or yellow (for public transport in India) with black
characters, therefore we have to begin with the closing operation as shown in Figure 5(a). Now,
to erase white pixels that are not characters, an opening operation with a vertical SE whose
height is less than minimum license plate character height is used as shown in Figure 5(b).
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 112
Chirag N. Paunwala & Suprava Patnaik
From last step, it is observe that the region with bigger value of vertical gradient can roughly
represent the region of license plate. So the license plate region tends to have a big value for
horizontal projection of vertical gradient variance. According to this feature of license plate, we
calculate the horizontal projection of gradient variance using (9).
n
TH (i ) gv (i , j ) (9)
i 1
There may be many burrs in the horizontal projection and to reduce or smoothen out these burrs
in discrete curve Gaussian filter has to apply as shown in (10).
1
w TH (i j ) h ( j , )
T ' H (i ) TH (i )
k j 1 T (i j ) h( j , )
H
( j 2 )/2
where h( j , ) e ; (10)
w
k 2 h( j , ) 1
j 1
In (10), TH(i) represents the original projection value, T’H(i) shows the filtered projection value,
and i changes from 1 to n, where n is number of rows. w is the width of the Gaussian operator;
h( j , ) is the Gauss filter and represents the standard deviation. After many experiments, the
practicable values of Gauss filter parameters have been chosen w = 6 and = 0.05. The result of
smoothening of horizontal projection by Gauss Filter is shown in Figure 6.
120
100
Horizontal Projection
60
40
20
0
0 50 100 150 200 250 300 350 400 450 500
Number of Rows
As shown in the Figure 6, some rows and columns from the top and bottom are discarded from
the main image on the assumption that license plate is not part of that region and thereby
reducing computationally complexity. One of wave ridges in Figure 6 must represent the
horizontal position of license plate. So the apices and vales should be checked and identified. For
many vehicles may have poster signs in the back window or other parts of the vehicle that would
deceive the algorithm. Therefore, we have used a threshold T to locate the candidates of the
horizontal position of the license plate. The threshold is calculated by (11) where m represents
the mean of the filtered projection value and wt represents weight parameter.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 113
Chirag N. Paunwala & Suprava Patnaik
T=wt*m (11)
Where wt = 1.2. If T’H(i) is larger than or equal to T, it considers as a probable region of interest.
Figure 7 (a) shows the image containing rows which have higher value of horizontal projection.
We apply sequence of morphological operations to this particular image to connect the edge
pixels and filter out the non-license plate regions. The result after this operation is shown in
Figure 7 (b).
Remaining Candidate Regions
FIGURE 7: (a) Remaining Regions after Thresholding (b) After Sequence of Morphological Deal
In subsequent step, the algorithm of connected component analysis is used to locate the
coordinates of the 8-connected components. The minimum rectangle, which encloses the
connected components, stands as a candidate for vehicle license plate. The result of connected
component analysis is shown in Figure 8.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 114
Chirag N. Paunwala & Suprava Patnaik
shape feature of license plates. The aspect ratio is defined as the ratio of the height to the width
of the region’s rectangle. From experimentations, (1) components have height less than 7 pixels
and width less than 60 pixels, (2) components have height greater than 60 or width greater than
260 pixels (3) components for which difference between the width and height is less than 30 and
(4) components having height to width ratio less than 0.2 and greater than 0.7 are discarded from
the eligible license plate regions. In transportation vehicle and vehicles consisting of two row
license plate aspect ratio varies nearer to 0.6. In aspect ratio analysis third parameter is very
crucial as it helps to discard the component which satisfying first two conditions.
Vertical
Edges with
scanning
line
Count at 12,18,10 15,14,20 12,11,16 44,46,42 39,42,45
(H/3,H/2,
H-H/3)
comments Non LP Non LP Non LP Accepted as Accepted
component component component LP as LP
4. EXPERIMENTATION RESULTS
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 115
Chirag N. Paunwala & Suprava Patnaik
We have divided the vehicles in the following categories: Images consists of (1) single vehicle (2)
more than one vehicle. Both the above two categories are further subdivided in day and night
conditions; soiled license plate; plates consist of shadows and blurry condition.
As the first step toward this goal, a large image data set of license plates has been collected and
grouped according to several criteria such as type and color of plates, illumination conditions,
various angles of vision, and indoor or outdoor images. The proposed algorithm is tested on a
large database consisting of 1000 vehicle images of Indian condition as well as database
received from [24].
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 116
Chirag N. Paunwala & Suprava Patnaik
Input Image
lp no. 1 lp no. 1
The proposed algorithm is able to detect the license plate successfully with 99.1% accuracy from
various conditions. Table 2 and Table 3 show the comparison of proposed algorithm with some
existing algorithms. The proposed method is implemented on a personal computer with an Intel
Pentium Dual-Core processor-1.73GHz CPU/1 GB DDR2 RAM using Matlab v.7.6.
6. REFERENCES
[1] F. Martin, M. Garcia and J. L. Alba. “New methods for Automatic Reading of VLP’s (Vehicle
License Plates),” in Proc. IASTED Int. Conf. SPPRA, pp: 126-131, 2002.
[2] C. Wu, L. C. On, C. H. Weng, T. S. Kuan, and K. Ng, “A Macao License Plate Recognition
system,” in Proc. 4th Int. Conf. Mach. Learn. Cybern., China, pp. 4506–4510, 2005.
[3] Feng Yang,Fan Yang. “Detecting License Plate Based on Top-hat Transform and Wavelet
Transform”, ICALIP, pp:998-2003, 2008
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 117
Chirag N. Paunwala & Suprava Patnaik
[4] Feng Yang, Zheng Ma. “Vehicle License Plate Location Based on Histogramming and
Mathematical Morphology”, Automatic Identification Advanced Technologies, 2005. pp:89 –
94, 2005
[5] R.C. Gonzalez, R.E. Woods, “Digital Image Processing”, PHI, second edd, pp: 519:560 (2006)
[6] B. Hongliang and L. Changping. “A Hybrid License Plate Extraction Method Based on Edge
Statistics and Morphology,” in Proc. ICPR, pp. 831–834, 2004.
[7] W. Wen, X. Huang, L. Yang, Z. Yang and P. Zhang, “The Vehicle License Plate Location
Method Based-on Wavelet Transform”, International Joint Conference on Computational
Sciences and Optimization, pp:381-384, 2009
[8] P. V. Suryanarayana, S. K. Mitra, A. Banerjee and A. K. Roy. “A Morphology Based Approach
for Car License Plate Extraction”, IEEE Indicon, vol.-1, pp: 24-27, 11 - 13 Dec. 2005
[9] H. Mahini, S. Kasaei, F. Dorri, and F. Dorri. “An efficient features–based license plate
localization method,” in Proc. 18th ICPR, Hong Kong, vol. 2, pp. 841–844, 2006.
[10] H.-J. Lee, S.-Y. Chen, and S.-Z. Wang, “Extraction and Recognition of License Plates of
Motorcycles and Vehicles on Highways,” in Proc. ICPR, pp. 356–359, 2004.
[11] C. Anagnostopoulos, I. Anagnostopoulos, E. Kayafas, and V. Loumos. “A License Plate
Recognition System for Intelligent Transportation System Applications”, IEEE Trans. Intell.
Transp. Syst., 7(3), pp. 377– 392, Sep. 2006.
[12] J. Sauvola and M. Pietikäinen, “Adaptive Document Image Binarization,” Pattern
Recognition, 33(2), pp. 225–236, Feb. 2000.
[13] T. D. Duan, T. L. H. Du, T. V. Phuoc, and N. V. Hoang, “Building an automatic vehicle
license-plate recognition system,” in Proc. Int. Conf. Computer Sci. (RIVF), pp. 59–63, 2005.
[14] J. Kong, X. Liu, Y. Lu, and X. Zhou. “A novel license plate localization method based on
textural feature analysis,” in Proc. IEEE Int. Symp. Signal Process. Inf. Technol., Athens,
Greece, pp. 275–279, 2005.
[15] M. Wu, L. Wei, H. Shih and C. C. Ho. “License Plate Detection Based on 2-Level 2D Haar
Wavelet Transform and Edge Density Verification”, IEEE International Symposium on
Industrial Electronics (ISlE), pp: 1699-1705, 2009.
[16] N.Otsu. “A Threshold Selection Method from Gray-Level Histograms”, IEEE Trans. Sys., Man
and Cybernetics, 9(1), pp.62-66, 1979.
[17] X. Shi,W. Zhao, and Y. Shen, “Automatic License Plate Recognition System Based on Color
Image Processing”, 3483, Springer-Verlag, pp. 1159–1168, 2005.
[18] Shih-Chieh Lin, Chih-Ting Chen , “Reconstructing Vehicle License Plate Image from Low
Resolution Images using Nonuniform Interpolation Method” International Journal of Image
Processing, Volume (1): Issue (2), pp:21-29,2008
[19] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern Anal. Mach.
Intell., 17(8), pp. 790–799, Aug. 1995.
[20] D. Comaniciu and P. Meer. “Mean shift: A Robust Approach Towards Feature Space
Analysis,” IEEE Trans. Pattern Anal. Mach. Intell., 24(5), pp. 603–619, May 2002
[21] W. Jia, H. Zhang, X. He, and M. Piccardi, “Mean shift for accurate license plate localization,”
in Proc. 8th Int. IEEE Conf. Intell. Transp. Syst., Vienna, pp. 566–571, 2005.
[22] Saeed Rastegar, Reza Ghaderi, Gholamreza Ardeshipr & Nima Asadi, “An intelligent control
system using an efficient License Plate Location and Recognition Approach”, International
Journal of Image Processing (IJIP) Volume(3), Issue(5), pp:252-264, 2009
[23] Naglaa Yehya Hassan, Norio Aakamatsu, “Contrast Enhancement Technique of Dark Blurred
Image”, IJCSNS International Journal of Computer Science and Network Security, 6(2),
pp:223-226, February 2006
[24] http://www.medialab.ntua.gr/research/LPRdatabase.html
[25] Ching-Tang Hsieh, Yu-Shan Juan, Kuo-Ming Hung, “Multiple License Plate Detection for
Complex Background”, Proceedings of the 19th International Conference on Advanced
Information Networking and Applications, pp.389-392, 2005.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 118
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
Abstract
1. INTRODUCTION
Image registration is a fundamental task in image processing used to align two different images.
Given two images to be registered, image registration estimates the parameters of the
geometrical transformation model that maps the sensed images back to its reference image [1].
In all cases of image registration, the main and required goal is to design a robust algorithm that
would perform automatic image registration. However, because of diversity in how the images
acquired, their contents and purpose of their alignment, it is almost impossible to design universal
method for image registration that fulfill all requirements and suits all types of applications [2][16].
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 119
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
Many of the image registration techniques have been proposed and reviewed [1], [2] [3]. Image
registration techniques can be generally classified in two categories [15]. The first category
utilizes image intensity to estimate the parameters of a transformation between two images using
an approach involving all pixels of the image. In second category a set of feature points extracted
from an image and utilizes only these feature points instead of all whole image pixels to obtain
the transformation parameters. In this paper, a new algorithm for image registration is proposed.
The proposed algorithm is based on three main steps, feature extraction, correspondence
between feature points and transformation parameter estimation.
The proposed algorithm utilizes the new approach, which exploits a nonsubsampled directional
multiresolution image representation, called nonsubsampled contourlet transform (NSCT), to
extract significant image features from reference and sensed image, across spatial and
directional resolutions and make two sets of extracted feature points for both images. Like
wavelet transform contourlet transform has multi-scale timescale localization properties. In
addition to that it also has the ability to capture high degree of directionality and anisotropy. Due
to its rich set of basis functions, contourlet can represent a smooth contour with fewer coefficients
in comparison to wavelets. Significant points on the obtained contour are then considered as
feature points for matching. Next step of correspondence between extracted feature points is
performed using Zernike moment-based similarity measure. This correspondence is evaluated
using a circular neighborhood centered on each feature point. Among various types of moments
available, Zernike moments is superior in terms of their orthogonality, rotation invariance, low
sensitivity to image noise [3], fast computation and ability to provide faithful image representation
[4]. Then after transformation parameters required to transform the sensed image into its
reference image by transformation estimation by solving least square minimization problem using
the positions of the two sets of feature points. Experimental results show that the proposed image
registration algorithm leads to acceptable registration accuracy and robustness against several
image deformations and image processing operations.
The rest of this paper is organized as follows. In section 2 the basic theory of NSCT is discussed.
In section 3 the proposed algorithm is described in detail. In section 4 experimental results of the
performance of the algorithm are presented and evaluated. Finally, conclusions with a discussion
are given section 5.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 120
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
version of contourlet transform. To obtain the shift invariance the NSCT is built upon iterated
nonsubsampled filter banks.
The construction design of NSCT is based on the nonsubsampled pyramid structure (NSP) that
ensures the multiscale property and nonsubsampled directional filter banks (NSDFB) that gives
directionality [13]. Fig. 1 (a) illustrates an overview of the NSCT. The structure consists in a bank
of filters that splits the 2-D frequency plane in the subbands illustrated in Fig. 1 (b).
(a) (b)
Figure 1: Nonsubsampled Contourlet Transform (a) NSFB structure that implements the NSCT.
(b) Idealized frequency partitioning obtained with the proposed structure [7].
H0(z)G0(z)+H1(z)G1(z)=1
Figure 2: Multiscale decomposition and construct nonsubsampled pyramids by iterated filter banks
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 121
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
(a) (b)
Figure 3: Nonsubsampled pyramid is a 2-D multiresolution expansion. (a) Three stage pyramid
decomposition. (b) Sub bands on the 2-D frequency plane [7].
More directional resolutions are obtained at higher scales by combination of NSP filters and
NSDFB to produce wedge like subbands. The result is a tree structured filter bank that splits the
2-D frequency plane into directional wedges [8]. This result in a tree composed of two-channel
NSFBs. Fig. 4 illustrates four channel decomposition.
(a) (b)
Figure 4: Four channel nonsubsampled directional filter bank constructed with two channel fan filter banks.
(a) Filtering structure. (b) Corresponding frequency decomposition [7].
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 122
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
In this section, the proposed registration algorithm is presented in detail. We take two images to
be aligned. An image without distortions considered as reference image or base image another
image with deformations considered as sensed image or distorted image or input image.
The problem of image registration is actually estimation of the transformation parameters using
the reference and sensed images. The transformation parameter estimation approach used in
this paper is based on feature points extracted from reference image I and sensed image I’, which
is geometrically distorted. The proposed registration process is carried out in three main steps.
Feature points extraction, finding correspondence between feature points and transformation
parameters estimation. This can be explained in detail as follows:
(i) Compute the NSCT coefficients of reference image and sensed image for N levels and L
directional subbands.
(ii) At each pixel, compute the maximum magnitude of all directional subbands at a specific level.
We call this frame “maxima of the NSCT coefficients”.
(iii) A thresholding procedure is then applied on the NSCT maxima image in order to eliminate
non significant feature points. A feature point is considered only if NSCT maxima > Th; where
Thj = C(σj+µj), where C is a user defined parameter, σj is standard deviation and j is mean of
the NSCT maxima image at a specific level 2j. The locations of the obtained thresholded
NSCT maxima Pi (i = 1, 2 . . ., K) are taken as the extracted feature points, where Pi = (xi, yi)
is the coordinates of a point Pi and K is the number of feature points. An example of the
feature points detected from reference image is illustrated in Fig. 5.
(a) (b)
Figure 5: Feature point extraction: (a) Reference image (b) NSCT maxima image marked by extracted 35
feature points when N is 2.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 123
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
Initially number of levels taken are 2. But for extraction of robust feature points, necessary for
large geometrical deformations, we need to increase the N level in the proposed algorithm.
[i] For every extracted feature point Pi, select a circular neighbourhood of radius R centred at this
point and construct a Zernike moments descriptor vector Pz as
Where | Zp,q | is the magnitude of Zernike moments of a nonnegative integer of order p, where p-
|q| is even and |q|≤ p. While higher order moments carry fine details of the image, they are more
sensitive to noise than lower order moments [5]. Therefore the highest order used in this
algorithm is selected to achieve compromise between noise sensitivity and the information
content of the moments. The Zernike moments of order p are defined as
(p +1) *
Z pq = V pq (r ,q ) A(x ,y ) (2)
π
Where, x2+y2 ≤1, r = (x2+y2)1/2, θ=tan-1(y/x). ‘x’ and ‘y’ are normalised pixel location in the range
-1 to +1, lying on an image size grid. Accordingly radius r can have maximum value one. Fig 4(b)
shows the unit radius circle along with the significant feature points for the reference image. In the
above equation Vpq* denotes the Zernike polynomial of order p and repetition q. It can be defined
as
Vpq (r ,q)=Rpq (r )eiqθ (3)
(p -|q|/2)
(p-s)!
Rpq (r)= (-1) s
r p 2 s (4)
s =0 p-2s +|q| p-2s-|q|
s! ! !
2 2
Rpq depends on the distance of the feature point from the image centre. Hence the proposed
method has limitation to work well for rotations about image axis passing through the image
centre. Fig. 6 illustrates the two images, reference image and sensed image with 60 degree
rotation about the central image axis. Zernike moment vector magnitude for a feature pair is
shown in Table 1.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 124
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
Figure 6: Correspondence between feature points in the reference and sensed image (rotated by 60 deg)
TABLE 1: Zernike moment magnitude for a feature point pair from reference image and sensed image
(rotated by 60 deg).
[ii] The feature points of the reference image are matched with the feature points of the sensed
image by computing the correlation coefficients of the two descriptor vectors. The matched
points are those who give maximum coefficient correlation value. The correlation coefficient C
of two feature vectors V1 and V2 is defined as
Where, m1 and m2 are the mean values of the two vectors V1 and V2 respectively.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 125
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
5 10 15 20 25 30 35 40
10 20 30 40 50 60 70 80
15 30 45 60 75 95 105 120
20 40 45 65 85 105 125 135
I=
25 50 60 85 100 115 130 145
30 60 75 105 115 130 145 160
35 70 90 125 130 145 160 175
40 80 105 135 145 160 175 190
Normalize the pixel locations for x and y varying from -1 to +1 with a step size 0.2857. Apply an
8 x 8 mask, with pixel values one within the unit circle, which is required for the calculation of
Zernike moment.
0 0 0 0 0 0 0 0
0 0 1 1 1 1 0 0
0 1 1 1 1 1 1 0
0 1 1 1 1 1 1 0
Mask =
0 1 1 1 1 1 1 0
0 1 1 1 1 1 1 0
0 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0
For finding Zernike moment at grid location (3,4), value of r and θ are 0.4518 and -1.89
respectively, which are 12th element of r and θ vectors. Pixel intensity, I(3, 4) = 60. For p=3 and
q=1 satisfying the above mentioned condition, from equation (4), R31the polynomial value
becomes -0.6269. Putting this value in equation 3, we get V31(r, θ) = 0.19825 + 0.59485i.
Applying all the above values in equation 2, we get Z31=15.29 + 45.4588i. Finally in log scale we
get [abs (log (Z31))] =4.0661.
4. EXPERIMENTAL RESULTS
In this section, the evaluation of the performance of the proposed algorithm is done by applying
different types of distortions. A reference image is geometrically distorted and in addition noise is
being added or the image is compressed or expanded. The parameters of the geometric
distortion are obtained by applying the proposed algorithm using the reference and sensed
images. A set of simulation has been performed to assess the performance of the proposed
algorithm with respect to registration accuracy and robustness.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 126
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
(a) (b)
(c) (d)
(e) (f)
Registered Reference
image image
(g)
Figure 7: Experimental results (a) Reference image (b) Sensed image (rotated by 37 deg) (c) NSCT
maxima image of reference image, N is 2 (d) NSCT maxima image of sensed image, N is 2 (e) Registered
image (f) Registered image overlaid on Reference image (g) Enlarged portion of the overlaid image.
A gray level “boat” image of size 256x256 is being used as a reference image. The simulation
results have been obtained using MATLAB software package. The experiments were performed
according to the following settings: NSCT decomposition of all test images performed using the
NSCT toolbox, was carried out with N = 2 resolution levels. But to increase the capability of
proposed algorithm for higher amount of distortions we need to increase resolution levels N to 3.
The parameter C is user defined and ranges from 4 to 8 and the Zernike moment based
descriptor neighbourhood radius R = 20. Results of registering the geometrically distorted images
combined with other image processing operation are shown in Fig. 7. In this figure, first reference
image and sensed images were shown. Then the NSCT maxima images of both the images have
been shown. At the last registered image overlaid on the reference image is shown. To highlight
the registration accuracy a small square section of the reference image which is not available in
the sensed image after rotation has been magnified along with connected features from the
sensed image. Perfect alignment between the two images justifies the registration accuracy.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 127
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
The applied distortions/ transformations are shown in below Figures. It can be seen that, the
estimated transformation parameters are very close to the actual applied parameters. This
illustrates the accuracy in image recovery, in the presence of noise, coarse compression or
expansion of the image. Figures 8 to 11 shows simulation results with different rotation and scale
which shows the accuracy of registration.
(a) (b)
(c) (d)
Figure 8: (a) Reference image (b) Sensed image (rotated by 100 degrees)
(c) Registered image (d) Registered image overlaid on Reference image (N = 3).
(a) (b)
(c) (d)
Figure 9:(a) Reference image (b) Sensed image (rotated by 80 degrees Scaled by 0.8)
(c) Registered image (d) Registered image overlaid on Reference image (N = 2).
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 128
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
(a) (b)
(c) (d)
Figure 10: (a) Reference image (b) Sensed image (rotated by 80 degrees Scaled by 2.2)
(c)Registered image (d) Registered image overlaid on Reference image (N = 2).
(a) (b)
(c) (d)
Figure 11 :( a) Reference image (b) Sensed image (rotated by 10 degrees with Gaussian noise,
mean 0 and variance 0.02)
(c) Registered image (d) Registered image overlaid on Reference image (N = 2).
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 129
Jignesh Sarvaiya, Suprava Patnaik & Hemant Goklani
5. CONCLUSION
In the proposed algorithm major and basic elements of the feature based automated image
registration has been explored. We use nonsubsampled contourlet transform (NSCT) based
feature point extractor, to extract significant image feature points across spatial and directional
resolutions. Zernike moments based similarity measure is used for feature correspondence. The
experimental results clearly indicate that the registration accuracy and robustness is very
acceptable. This confirms the success of the proposed NSCT based feature points extraction
approach for image registration.
6. REFERENCES
[1] Brown L G., “A survey of image registration techniques”. ACM Computing Surveys, 24(4),
325-376, 1992.
[2] A. Ardeshir Goshtasby, “2-D and 3-D Image Registration for Medical, Remote Sensing, and
Industrial Applications”, A John Wiley & Sons, Inc., Publication, USA.
[3] B. Zitova and J. Flusser, “Image Registration methods: A Survey”. Image Vision Computing,
21(11), 977-1000, 2003.
[4] A. Khotanzad and Y.H. Hong, “Invariant Image Recognition by Zernike moment”. IEEE Trans.
PAMI, 12(5), 489-497, 1990.
[5] Cho-Huak and R.T. Chin, “On Image Analysis by the method of moments”. IEEE Trans.
PAMI, 10(4), 496-513, 1988.
[6] M.N. Do and M. Vetterli, “The Contourlet Transform: an Efficient Directional multiresolution
Image Representation”. IEEE Trans. on Image Processing, 14(12), 2091-2106, 2005.
[7] A.L.Cunha, J. Zhou, and M.N. Do, “The Nonsubsampled Contourlet Transform: Theory,
Design, and Applications”. IEEE Trans. on Image Processing, 15(10), 3089-3101, 2006.
[8] R. H. Bamberger and M. J. T. Smith, “A filter bank for the directional decomposition of
images: theory and design”. IEEE Trans. on Signal Processing, 40(7), 882-893, 1992.
[9] S. X. Liao and M. Pawlak, “On the Accuracy of Zernike Moments for Image Analysis”. IEEE
Trans. on Pattern Analysis and Machine Intelligence, 20(12)1358-1364, 1998.
[10] J. Zhou, A.L. Cunha, and M.N. Do, “The Nonsubsampled Contourlet Transform: Construction
and Application in Enhancement”. In Proceedings of IEEE Int. Conf. on Image Processing,
ICIP 2005, (1), 469-472, 2005.
[11] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code”. IEEE
Trans. on Commun., 31(4), 532–540, 1983.
[12] M. N. Do and M. Vetterli, “Framing pyramids”. IEEE Trans. Signal Process. 51(9),2329-2342,
2003.
[13] C. Serief, M. Barkat, Y. Bentoutou and M. Benslama, “Robust feature points extraction for
image registration based on the nonsubsampled contourlet transform”. International Journal
Electronics Communication, 63( 2), 148-152, 2009.
[14] Manjunath B S and Chellappa R. “A feature based approach to face recognition”. In
Proceedings of IEEE conference on computer vision and pattern recognition, Champaign,
373–378, 1992.
[15] M.S. Holia and V.K.Thakar, “Image registration for recovering affine transformation using
Nelder Mead Simplex method for optimization”,International Journal of Image
Processing(ISSN 1985 2304) , Vol.3, No.5, pp.218-228, November 2009
[16] R. Bhagwat and A. Kulkarni, “An Overview of Registration Based and Registration Free
Methods for Cancelable Fingerprint Template”, International Journal of Computer Science
and Security ISSN (1985-1553) Vol. 4 , No.1 pp.23-30, March 2010.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 130
J.Rajeesh, R.S.Moni, S.Palanikumar & T.Gopalakrishnan
J.Rajeesh rajeesh_j@yahoo.co.in
Senior Lecturer/Department of ECE
Noorul Islam College of Engineering
Kumaracoil, 629180, India
R.S.Moni moni2006_r_s@yahoo.co.in
Professor/Department of ECE
Noorul Islam University
Kumaracoil, 629180, India
S.Palanikumar palanikumarcsc@yahoo.com
Assistant Professor/Department of IT
Noorul Islam University
Kumaracoil, 629180, India
T.Gopalakrishnan gopalme@gmail.com
Lecturer/Department of ECE
Noorul Islam University
Kumaracoil, 629180, India
Abstract
Keywords: De-noising, Gaussian noise, Magnetic Resonance Images, Rician noise, Wave Atom
Shrinkage.
1. INTRODUCTION
De-noising of magnetic resonance (MR) images remains a critical issue, spurred partly by the
necessity of trading-off resolution, SNR, and acquisition speed, which results in images that still
demonstrate significant noise levels [1]–[7]. Sources of MR noise [8] include thermal noise (from
the conductivity of the system’s hardware), inductive losses (from the conductivity of the object
being imaged), sample resolution, and field-of-view (among others). Understanding the spatial
distribution of noise in an MR image is critical to any attempt to estimate the underpinning (true)
signal. The investigation of how noise is distributed in MR images (along with techniques
proposed to ameliorate the noise) has a long history. It was shown that pure noise in MR
magnitude images could be modeled as a Rayleigh distribution [1]. Afterwards, the Rician model
[4] was proposed as a more general model of noise in MR images. Reducing noise has always
been one of the standard problems of the image analysis. The success of many analysis
techniques such as segmentation, classification depends mainly on the image being noiseless.
Magnetic Resonance Imaging (MRI) is a notable medical imaging technique that has proven to be
particularly valuable for examination of the soft tissues in the body. MRI is an imaging technique
that makes use of the phenomenon of nuclear spin resonance. Since the discovery of MRI, this
technology has been used for many medical applications. Because of the resolution of MRI and
the technology being essentially harmless it has emerged as the most accurate and desirable
imaging technology [9]. MRI is primarily used to demonstrate pathological or other physiological
alterations of living tissues and is a commonly used form of medical imaging. Despite significant
improvements in recent years, magnetic resonance (MR) images often suffer from low SNR or
Contrast-to-Noise Ratio (CNR), especially in cardiac and brain imaging. This is problematic for
further tasks such as segmentation of important features, three-dimensional image
reconstruction, and registration. Therefore, noise reduction techniques are of great interest in MR
imaging as well as in other imaging modalities.
This paper presents a de-noising method in Magnetic Resonance Images using Wave Atom
Shrinkage that leads to the improvement of SNR in low and high noise level images. The paper is
organized with sections as follows. In section II, the work related to this paper is briefly explained,
In section III, the theoretical concepts of wavelet transform, curvelet transform and wave atom
transforms are described, in section IV, the application of wave atom transform, curvelet
transform and wavelet transforms to MRI and observations are discussed. In section V, the paper
is concluded by briefly explained the pros and corns of the proposed method.
2. RELATED WORKS
The image processing literature presents a number of de-noising methods based on Partial
Differential Equations (PDEs) [10], including some of them concentrated on MR Images [11] –
[14]. Even though these methods have the advantage of simplicity and removal of stair case
effect that occurs with the TV-norm filter. Such methods, however, impose certain kinds of
models on local image structure that are often too simple to capture the complexity of anatomical
MR images. Further, these methods entail manual tuning of critical free parameters that control
the conditions under which the models prefer one sort of structure to another. These factors have
been an impediment to the widespread adoption of PDE-based techniques for processing MR
images.
Another approach to image restoration is nonparametric statistical methods. For instance, in [15],
[16] propose an unsupervised information-theoretic adaptive filter, namely UINTA, that relies on
nonparametric MRF models derived from the corrupted images. UINTA restores images by
generalizing the mean-shift procedure [17], [18] to incorporate neighborhood information. They
show that entropy measures on first-order image statistics are ineffective for de-noising and,
hence, advocate the use of higher-order/Markov statistics. UINTA, however, does not assume a
specific noise model during restoration. In [19], [20] proposed a de-noising strategy along similar
lines, namely NL-Means, but one relying on principles in nonparametric regression.
Recently, many of the popular de-noising algorithms suggested are based on wavelet
thresholding [21]–[24]. These approaches attempt to separate significant features/signals from
noise in the frequency domain and simultaneously preserve them while removing noise. If the
wavelet transform is applied on MR magnitude data directly, both the wavelet and the scaling
coefficients of a noisy MRI image become biased estimates of their noise-free counterparts.
Therefore, it was suggested [22] that the application of the wavelet transform on squared MR
magnitude image data (which is noncentral chi-square distributed) would result in the wavelet
coefficients no longer being biased estimates of their noise-free counterparts. Although the bias
still remains in the scaling coefficients, it is not signal-dependent and can therefore be easily
removed [22], [24]. The difficulty with wavelet or anisotropic diffusion algorithms is again the risk
of over-smoothing fine details particularly in low SNR images [25].
From points discussed above, it is understood that all the algorithms have the drawback of over-
smoothing fine details. In [26], stated that oscillatory functions or oriented textures have a
significantly sparser expansion in wave atoms than in other fixed standard representations like
Gabor filters, wavelets and curvelets. Due to the signal dependent mean of the Rician noise, one
can overcome this problem by filtering the square of the noisy MR magnitude image in the
transformed coefficients [22].
3. THEORY
3.1. Wavelet
Wavelet bases are bases of nested function spaces, which can be used to analyze signals at
multiple scales. Wavelet coefficients carry both time and frequency information, as the basis
functions varies in position and scale. The fast wavelet transform (FWT) efficiently converts a
signal to its wavelet representation [27]. In a one-level FWT, a signal is split into an approximation
part and a detail part. In a multilevel FWT, each subsequent is split into an approximation and
detail. For 2-D images, each subsequent is split into an approximation and three detail channels
as horizontally, vertically, and diagonally oriented details, respectively. The inverse FWT (IFWT)
reconstructs each subsequent from approximation and detail channels. If the wavelet basis
functions do not have compact support, the FWT is computed most efficiently in the frequency
domain. This transform and its inverse are called the Fourier-wavelet decomposition (FWD) and
Fourier-wavelet reconstruction (FWR), respectively.
3.2. Curvelet
The curvelet transform, like the wavelet transform, is a multiscale transform, with frame elements
indexed by scale and location parameters. Unlike the wavelet transform, it has directional
parameters, and the curvelet pyramid [28][29] contains elements with a very high degree of
directional specificity. The elements obey a special scaling law, where the length of the support of
2
a frame elements and the width of the support are linked by the relation width = length . Curvelets
are interesting because they efficiently address very important problems where wavelets are far
from ideal.
For example, Curvelets provide optimally sparse representations of objects, which display curve-
punctuated smoothness except for discontinuity along a general curve with bounded curvature.
Such representations are nearly as sparse as if the object were not singular and turn out to be far
sparser than the wavelet decomposition of the object.
3.2. Waveatom
Demanet and Ying [31] introduced so-called wave atoms, that can be seen as a variant of 2-D
wavelet packets and obey the parabolic scaling of curvelets wavelength= (diameter) 2. Oscillatory
functions or oriented textures (e.g., fingerprint, seismic profile, engineering surfaces) have a
significantly sparser expansion in wave atoms than in other fixed standard representations like
Gabor filters, wavelets, and curvelets.
Wave atoms have the ability to adapt to arbitrary local directions of a pattern, and to sparsely
represent anisotropic patterns aligned with the axes. In comparison to curvelets, wave atoms not
only capture the coherence of the pattern along the oscillations, but also the pattern across the
oscillations.
In the following, we shortly summarize the wave atom transform as recently suggested in [31].
See also [32] for a very related approach.
j 1 i 2 j n j
c j ,m,n u( x) m,n ( x) dx e m ( )uˆ ( ) d (11)
2
In the 2-D case, Let ( j, m, n) , where m (m1, m2 ) and n ( n1, n2 ) . We consider
( x1, x2 ) j ( x ) j (x ) (12)
m1, n1 1 m2 ,n2 2
and the Hilbert transformed wavelet packets
( x1, x2 ) j ( x ) j (x ) (13)
m1, n1 1 m2 ,n2 2
In [31], a discretization of this transform is described for the 1-D case, as well as an extension to
two dimensions. The algorithm is based on the fast Fourier transform and a wrapping trick. For
implementation software, we refer to the homepage http://www.waveatom.org/software.html due
to Demanet and Ying.
The wave atom shrinkage can be formulated as a hard threshold function given by
2
h ( x) x x , x (16)
0, x
Where is the standard deviation, estimated by histogram based techniques.
where x is noise free simulated image and x̂ is noisy image or de-noised image.
The shrinkage is obtained by
X u T 1T (u ) (18)
Analysis is made with four conditions vide i) fixed high SNR for various threshold ii) fixed low SNR
for various threshold iii) fixed low threshold for various SNR and iv) fixed high threshold for
various SNR
i) The chosen SNR is 19.0505 dB and the threshold is varied from 0.03 to 0.3. The
observations are given in Fig 1. It shows the wave atom shrinkage gives higher SNR on all
threshold values compared to wavelet shrinkage and curvelet shrinkage. The performances of the
models are given in Fig 2 for the threshold 0.06.
ii) The chosen SNR is 9.21 dB and the threshold is varied from 0.03 to 0.3. The
observations are given in Fig 3. It shows the wave atom shrinkage gives higher SNR on all
threshold values compared to wavelet shrinkage and curvelet shrinkage except for the threshold
0.03,0.06 and 0.3, where the curvelet shrinkage perform well. The performances of the models
are given in Fig 4. for the threshold 0.24.
iii) Here the threshold is fixed at 0.06 and SNR is varied from 9.37 dB to 19.12 dB. It
shows the proposed method increases the SNR to the maximum of 16.6% compared to wavelet
by 14.6% and curvelet by 8%. The analysis is presented in Fig 5.
iv) Here the threshold is fixed at 0.24 and SNR is varied from 9.26 dB to 18.87 dB. It
shows the proposed method increases the SNR to the maximum of 57% compared to wavelet by
52% and curvelet by 52%. The analysis is presented in Fig 6.
It is observed that the performance of all filters depends on the proper selection of threshold
value and the SNR of the noisy image. Also, the performance of the proposed method is better
with the maximum increase of 57% SNR compared to the methods given in [35] like anisotropic
diffusion by 34% and UINTA by 18.1% increase of SNR.
FIGURE 2: High SNR images (a) Noisy image (b) De-noised using Wave Atom (c) De-noised using Wavelet
(d) Den-noised using Curvelet.
FIGURE 4: Low SNR images (a) Noisy image (b) De-noised using Wave Atom (c) De-noised using
Wavelet (d) Den-noised using Curvelet.
FIGURE 5: Performance between noisy and de-noised images with the threshold of 0.06.
FIGURE 6: Performance between noisy and de-noised images with the threshold of 0.24
FIGURE 7: Real (a) Noisy image (b) De-noised using Wave Atom (c) De-noised using Wavelet (d) Den-
noised using Curvelet
5. CONCLUSION
A novel scheme is proposed for the de-noising of Magnetic Resonance Images using wave atom
shrinkage. It is proved that the proposed approach achieves a better SNR compared to wavelet
and curvelet shrinkages. The edge preserving property is clearly an advantage of the proposed
method. Further, including a large dataset of real-time normal and pathological MR images will
emphasize the efficiency of proposed method. The next work is to analyze the performance of the
proposed method on other modalities of MRI such as T2 and PD.
6. REFERENCES
1. W. A. Edelstein, P. A. Bottomley, and L. M. Pfeifer.A signal-to-noise calibration procedure for
nmr imaging systems. Med. Phys 1984;11:2:180–185.
2. E. R. McVeigh, R. M. Henkelman, and M. J. Bronskill. Noise and filtration in magnetic
resonance imaging. Med. Phys 1985;12:5:586–591.
3. R. M. Henkelman. Measurement of signal intensities in the presence of noise in mr images.
Med. Phys 1985;12:2:232–233.
4. M. A. Bernstein, D. M. Thomasson, and W. H. Perman. Improved detectability in low signal-to-
noise ratio magnetic resonance images by means of phase-corrected real construction. Med.
Phys 1989;16:5:813–817.
5. M. L. Wood, M. J. Bronskill, R. V. Mulkern, and G. E. Santyr. Physical MR desktop data. Magn
Reson Imaging 1994;3:19–24.
6. H. Gudbjartsson and S. Patz. The Rician distribution of noisy MRI data. Magn Reson Med
1995;34:6:910–914.
7. A. Macovski. Noise in MRI. Magn Reson Med 1996;36:3:494–497.
8. W. A. Edelstein, G. H. Glover, C. J. Hardy, and R. W. Redington. The intrinsic SNR in NMR
imaging. Magn Reson Med 1986;3:4:604–618.
9. G.A. Wright. Magnetic Resonance Imaging. IEEE Signal Processing Magazine 1997;1:56-66.
10. X. Tai, K. Lie, T. Chan, and S. Osher, Eds. Image Processing based on Partial Differential
Equations 2005; New York: Springer.
11. G. Gerig, O. Kubler, R. Kikinis, and F. A. Jolesz. Nonlinear anisotropic filtering of MRI data.
IEEE Trans Med Imag 1992;11:2:221–232.
12. M. Lysaker, A. Lundervold, and X. Tai. Noise removal using fourth-order partial differential
equation with applications to medical magnetic resonance images in space and time. IEEE Trans
Image Process 2003;12:12:1579–1590.
13. A. Fan, W. Wells, J. Fisher, M. Çetin, S. Haker, R. Mulkern, C. Tempany, and A.Willsky. A
unified variational approach to denoising and bias correction in MR. Inf Proc Med Imag
2003;148–159.
14 S. Basu, P. T. Fletcher, and R. T. Whitaker. Rician noise removal in diffusion tensor MRI. Med
Imag Comput Comput Assist Intervention 2006;117–125.
15. S. P. Awate and R. T. Whitaker. Higher-order image statistics for unsupervised, information-
theoretic, adaptive, image filtering. Proc IEEE Int Conf. Comput Vision Pattern Recognition
2005;2:44–51.
16 S. P. Awate and R. T. Whitaker. Unsupervised, information-theoretic, adaptive image filtering
for image restoration. IEEE Trans Pattern Anal Mach Intell 2006;28:3:364–376.
17. K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with
applications in pattern recognition. IEEE Trans Inf Theory 1975;21:1:32–40.
18. D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis.
IEEE Trans Pattern Anal Mach Intell 2002;24:5:603–619.
19. A. Buades, B. Coll, and J. M. Morel. A non-local algorithm for image denoising. IEEE Int Conf
Comp Vis Pattern Recog 2005;2:60–65.
20. A. Buades, B. Coll, and J. M. Morel. A review of image denoising algorithms, with a new one.
Multiscale Modeling Simulation 2005;4:2:490–530.
21. J. B. Weaver, Y. Xu, D. M. Healy Jr., and L. D. Cromwell. Filtering noise from images with
wavelet transforms. Magn Reson Med 1991;21:2:288–295.
22. R. D. Nowak. Wavelet-based Rician noise removal for magnetic resonance imaging. IEEE
Trans Image Process 1999;8:10:1408–1419.
23. A. M. Wink and J. B. T. M. Roerdink. Denoising functional MR images: A comparison of
wavelet denoising and Gaussian smoothing. IEEE Trans Image Process 2004;23:3:374–387.
Abstract
The thirst of better and faster retrieval techniques has always fuelled to the research in
content based image retrieval (CBIR). The paper presents innovative content based image
retrieval (CBIR) techniques based on feature vectors as fractional coefficients of transformed
images using Discrete Cosine, Walsh, Haar and Kekre’s transforms. Here the advantage of
energy compaction of transforms in higher coefficients is taken to greatly reduce the feature
vector size per image by taking fractional coefficients of transformed image. The feature
vectors are extracted in fourteen different ways from the transformed image, with the first
being considering all the coefficients of transformed image and then fourteen reduced
coefficients sets (as 50%, 25%, 12.5%, 6.25%, 3.125%, 1.5625% ,0.7813%, 0.39%, 0.195%,
0.097%, 0.048%, 0.024%, 0.012% and 0.06% of complete transformed image) are considered
as feature vectors. The four transforms are applied on gray image equivalents and the colour
components of images to extract Gray and RGB feature sets respectively. Instead of using all
coefficients of transformed images as feature vector for image retrieval, these fourteen
reduced coefficients sets for gray as well as RGB feature vectors are used, resulting into
better performance and lower computations. The proposed CBIR techniques are implemented
on a database having 1000 images spread across 11 categories. For each proposed CBIR
technique 55 queries (5 per category) are fired on the database and net average precision
and recall are computed for all feature sets per transform. The results have shown
performance improvement (higher precision and recall values) with fractional coefficients
compared to complete transform of image at reduced computations resulting in faster retrieval.
Finally Kekre’s transform surpasses all other discussed transforms in performance with
highest precision and recall values for fractional coefficients (6.25% and 3.125% of all
coefficients) and computation are lowered by 94.08% as compared to DCT.
Keywords: CBIR, Discrete Cosine Transform (DCT), Walsh Transform, Haar Transform, Kekre’s
Transform, Fractional Coefficients, Feature Vector.
1. INTRODUCTION
The computer systems have been posed with large number of challenges to store/transmit
and index/manage large numbers of images effectively, which are being generated from a
variety of sources. Storage and transmission is taken care by Image compression with
significant advancements been made [1,4,5].Image databases deal with the challenge of
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 142
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
image indexing and retrieval[2,6,7,10,11], which has become one of the promising and
important research area for researchers from a wide range of disciplines like computer vision,
image processing and database areas. The thirst of better and faster image retrieval
techniques is till appetising to the researchers working in some of important applications for
CBIR technology like art galleries [12,14], museums, archaeology [3], architecture design
[8,13], geographic information systems [5], weather forecast [5,22], medical imaging [5,18],
trademark databases [21,23], criminal investigations [24,25], image search on the Internet
[9,19,20].
1.1 Content Based Image Retrieval
In literature the term content based image retrieval (CBIR) has been used for the first time by
Kato et.al.[4], to describe his experiments into automatic retrieval of images from a database
by colour and shape feature. The typical CBIR system performs two major tasks [16,17]. The
first one is feature extraction (FE), where a set of features, called feature vector, is generated
to accurately represent the content of each image in the database. The second task is
similarity measurement (SM), where a distance between the query image and each image in
the database using their feature vectors is used to retrieve the “closest” images [16,17,26].
For CBIR feature extraction the two main approaches are feature extraction in spatial domain
[5] and feature extraction in transform domain [1]. The feature extraction in spatial domain
includes the CBIR techniques based on histograms [5], BTC [2,16,23], VQ [21,25,26]. The
transform domain methods are widely used in image compression, as they give high energy
compaction in transformed image[17,24]. So it is obvious to use images in transformed
domain for feature extraction in CBIR [1]. Transform domain results in energy compaction in
few elements, so large number of the coefficients of transformed image can be neglected to
reduce the size of feature vector [1]. Reducing the size feature vector using fractional
coefficients of transformed image and till getting the improvement in performance of image
retrieval is the theme of the work presented here. Many current CBIR systems use average
Euclidean distance [1,2,3,8-14,23]on the extracted feature set as a similarity measure. The
direct Average Euclidian Distance (AED) between image P and query image Q can be given
as equation 1, where Vpi and Vqi are the feature vectors of image P and Query image Q
respectively with size ‘n’.
n
1 2
AED
n
(Vpi Vqi)
i 1
(1)
B pq p q Amn cos
( 2m 1) p
cos
( 2n 1) q 0 p M 1
, (2)
2M 2N 0 q N 1
1 M ,p0
p (3)
2 M ,1 p M 1
1 N ,q 0
q (4)
2 N ,1 q N 1
where M and N are the row and column size of A, respectively. If you apply
the DCT to real data, the result is also real. The DCT tends to concentrate
information, making it useful for image compression applications and also
helping in minimizing feature vector size in CBIR [23]. For full 2-Dimensional
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 143
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
DCT for an NxN image the number of multiplications required are N2(2N) and
number of additions required are N2(2N-2).
3. WALSH TRANSFORM
Walsh transform matrix [1,11,18,19,26,30] is defined as a set of N rows, denoted Wj, for j = 0,
1, .... , N - 1, which have the following properties:
Wj takes on the values +1 and -1.
Wj[0] = 1 for all j.
WjxWk T=0, for j k and WjxWk T =N, for j=k.
Wj has exactly j zero crossings, for j = 0, 1, ...., N-1.
Each row Wj is even or odd with respect to its midpoint.
Walsh transform matrix is defined using a Hadamard matrix of order N. The Walsh transform
matrix row is the row of the Hadamard matrix specified by the Walsh code index, which must
be an integer in the range [0, ..., N - 1]. For the Walsh code index equal to an integer j, the
respective Hadamard output code has exactly j zero crossings, for j = 0, 1,..., N - 1. The step
of the algorithm to generate Walsh matrix from hadamard matrix by reordering hadamard
matrix is given below [30].
Step 1 : Let H be the hadamard matix of size NxN and W be the expected Walsh
Matrix of same size
Step 2 : Let seq=0, cseq=0, seq(0)=0, seq(1)=1, i=0
Step 3 : Repeat steps 3 to 12 till i <= log2(N)-2
Step 4 : s=size(seq)
Step 5 : Let j=1, Repeat steps 6 and 7 till j<=s(1)
Step 6 : cseq(j)=2*seq(j)
Step 7 : j=j+1
Step 8 : Let p=1, k=2*s(2) repeat steps 9 to 11 until k<=s(2)+1
Step 9 : cseq(k)=cseq(p)+1
Step 10 : p=p+1 and k=k-1
Step 11 : seq=cseq,
Step 12 : i=i+1
Step 13 : Let seq=seq+1
Step 14 : Let x and y indicate the rows and columns of ‘seq’
Step 15 : Let i=0 Repeat steps 16 and 17 till i<= y-1
Step 16 : q=seq(i)
Step 17 : i=i+1
Step 18 : Let i=0, repeat steps 19 to 22 till i<=s1-1
Step 19 : for j=0, repeat steps 20 and 21 till j<=s1-1
Step 20 : W(i,j)=H(seq(i),j)
Step 21 : j=j+1
Step 22 : i=i+1
For the full 2-Dimensional Walsh transform applied to image of size NxN, the number of
additions required are 2N2(N-1) and absolutely no multiplications are needed in Walsh
transform [1].
4. HAAR TRANSFORM
This sequence was proposed in 1909 by Alfréd Haar [28]. Haar used these
functions to give an example of a countable orthonormal system for the space
of square-integral functions on the real line. The study of wavelets, and even
the term "wavelet", did not come until much later [29,31].The Haar wavelet is
also the simplest possible wavelet. The technical disadvantage of the Haar
wavelet is that it is not continuous, and therefore not differentiable. This
property can, however, be an advantage for the analysis of signals with
sudden transitions, such as monitoring of tool failure in machines. The Haar
wavelet's mother wavelet function (t) can be described as:
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 144
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
(5)
(6)
5. KEKRE’S TRANSFORM
Kekre’s transform matrix is the generic version of Kekre’s LUV color space
matrix [1,8,12,13,15,22]. Kekre’s transform matrix can be of any size NxN,
which need not have to be in powers of 2 (as is the case with most of other
transforms). All upper diagonal and diagonal values of Kekre’s transform
matrix are one, while the lower diagonal part except the values just below
diagonal is zero.
1 1 1 .. 1 1
N 1 1 1 .. 1 1
0 N 2 1 .. 1 1
K NxN (7)
0 0 0 .. 1 1
0 0 0 .. N ( N 1) 1
The formula for generating the term Kxy of Kekre’s transform matrix is:
(8)
TABLE 1: Computational Complexity for applying transforms to image of size NxN [1]
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 145
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 146
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
ii. Take average of Red, Green and Blue components of respective pixels
to get gray image.
iii. Apply the Transform ‘T’ on gray image to extract feature vector.
iv. The result is stored as the complete feature vector ‘T-Gray’ for the
respective image.
Thus the feature vector database for DCT, Walsh, Haar and Kekre’s transform are generated
as DCT-Gray, Walsh-Gray, Haar-Gray, Kekre’s-Gray respectively. Here the size of feature
vector is NxN for every transform.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 147
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
ii. Apply the Transform ‘T’ on individual color planes of image to extract
feature vector.
iii. The result is stored as the complete feature vector ‘T-RGB’ for the
respective image.
Thus the feature vector database for DCT, Walsh, Haar, Kekre’s transform is
generated as DCT-RGB, Walsh-RGB, Haar-RGB, Kekre’s-RGB respectively.
Here the size of feature database is NxNx3.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 148
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
Figure 2 gives the sample database images from all categories of images
including scenery, flowers, buses, animals, aeroplanes, monuments, and tribal
people. To assess the retrieval effectiveness, we have used the precision and
recall as statistical comparison parameters [1,2] for the proposed CBIR
techniques. The standard definitions of these two measures are given by
following equations.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 149
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 150
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 151
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 152
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 153
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 154
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
retrieval could be greatly reduced, which ultimately will result in faster query
execution in CBIR with better performance. In all Kekre’s transform with
fractional coefficients (3.125 % in Gray and 6.25 % in RGB) gives best
performance with highest crossover points of average precision and average
recall. Feature extraction using Kekre’s transform is also computationally
lighter as compared to DCT or Walsh transform. Thus feature extraction in
lesser time is possible with increased performance.
Finally the conclusion that the fractional coefficients gives better discrimination
capability in CBIR than the complete set of transformed coefficients and
image retrieval with better performance at much faster rate can be done from
the proposed techniques and experimentation done.
11. REFERENCES
1. H.B.Kekre, Sudeep D. Thepade, “Improving the Performance of Image Retrieval using
Partial Coefficients of Transformed Image”, International Journal of Information Retrieval
(IJIR), Serials Publications, Volume 2, Issue 1, 2009, pp. 72-79(ISSN: 0974-6285)
2. H.B.Kekre, Sudeep D. Thepade, “Image Retrieval using Augmented Block Truncation
Coding Techniques”, ACM International Conference on Advances in Computing,
Communication and Control (ICAC3-2009), pp. 384-390, 23-24 Jan 2009, Fr.
ConceicaoRodrigous College of Engg., Mumbai. Is uploaded on online ACM portal.
3. H.B.Kekre, Sudeep D. Thepade, “Scaling Invariant Fusion of Image Pieces in Panorama
Making and Novel Image Blending Technique”, International Journal on Imaging (IJI),
www.ceser.res.in/iji.html, Volume 1, No. A08, pp. 31-46, Autumn 2008.
4. Hirata K. and Kato T. “Query by visual example – content-based image retrieval”, In
Proc. of Third International Conference on Extending Database Technology, EDBT’92,
1992, pp 56-71
5. H.B.Kekre, Sudeep D. Thepade, “Rendering Futuristic Image Retrieval System”, National
Conference on Enhancements in Computer, Communication and Information
Technology, EC2IT-2009, 20-21 Mar 2009, K.J.Somaiya College of Engineering,
Vidyavihar, Mumbai-77.
6. Minh N. Do, Martin Vetterli, “Wavelet-Based Texture Retrieval Using Generalized
Gaussian Density and Kullback-Leibler Distance”, IEEE Transactions On Image
Processing, Volume 11, Number 2, pp.146-158, February 2002.
7. B.G.Prasad, K.K. Biswas, and S. K. Gupta, “Region –based image retrieval using
integrated color, shape, and location index”, International Journal on Computer Vision
and Image Understanding Special Issue: Colour for Image Indexing and Retrieval,
Volume 94, Issues 1-3, April-June 2004, pp.193-233.
8. H.B.Kekre, Sudeep D. Thepade, “Creating the Color Panoramic View using Medley of
Grayscale and Color Partial Images ”, WASET International Journal of Electrical,
Computer and System Engineering (IJECSE), Volume 2, No. 3, Summer 2008. Available
online at www.waset.org/ijecse/v2/v2-3-26.pdf.
9. Stian Edvardsen, “Classification of Images using color, CBIR Distance Measures and
Genetic Programming”, Ph.D. Thesis, Master of science in Informatics, Norwegian
university of science and Technology, Department of computer and Information science,
June 2006.
10. H.B.Kekre, Tanuja Sarode, Sudeep D. Thepade, “DCT Applied to Row Mean and
Column Vectors in Fingerprint Identification”, In Proceedings of International Conference
on Computer Networks and Security (ICCNS), 27-28 Sept. 2008, VIT, Pune.
11. Zhibin Pan, Kotani K., Ohmi T., “Enhanced fast encoding method for vector quantization
by finding an optimally-ordered Walsh transform kernel”, ICIP 2005, IEEE International
Conference, Volume 1, pp I - 573-6, Sept. 2005.
12. H.B.kekre, Sudeep D. Thepade, “Improving ‘Color to Gray and Back’ using Kekre’s LUV
Color Space”, IEEE International Advanced Computing Conference 2009 (IACC’09),
Thapar University, Patiala, INDIA, 6-7 March 2009. Is uploaded and available online at
IEEE Xplore.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 155
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
13. H.B.Kekre, Sudeep D. Thepade, “Image Blending in Vista Creation using Kekre's LUV
Color Space”, SPIT-IEEE Colloquium and International Conference, Sardar Patel
Institute of Technology, Andheri, Mumbai, 04-05 Feb 2008.
14. H.B.Kekre, Sudeep D. Thepade, “Color Traits Transfer to Grayscale Images”, In Proc.of
IEEE First International Conference on Emerging Trends in Engg. & Technology,
(ICETET-08), G.H.Raisoni COE, Nagpur, INDIA. Uploaded on online IEEE Xplore.
15. http://wang.ist.psu.edu/docs/related/Image.orig (Last referred on 23 Sept 2008)
16. H.B.Kekre, Sudeep D. Thepade, “Using YUV Color Space to Hoist the Performance of
Block Truncation Coding for Image Retrieval”, IEEE International Advanced Computing
Conference 2009 (IACC’09), Thapar University, Patiala, INDIA, 6-7 March 2009.
17. H.B.Kekre, Sudeep D. Thepade, ArchanaAthawale, Anant Shah, PrathmeshVerlekar,
SurajShirke,“Energy Compaction and Image Splitting for Image Retrieval using Kekre
Transform over Row and Column Feature Vectors”, International Journal of Computer
Science and Network Security (IJCSNS),Volume:10, Number 1, January 2010, (ISSN:
1738-7906) Available at www.IJCSNS.org.
18. H.B.Kekre, Sudeep D. Thepade, ArchanaAthawale, Anant Shah, PrathmeshVerlekar,
SurajShirke,“Walsh Transform over Row Mean and Column Mean using Image
Fragmentation and Energy Compaction for Image Retrieval”, International Journal on
Computer Science and Engineering (IJCSE),Volume 2S, Issue1, January 2010, (ISSN:
0975–3397). Available online at www.enggjournals.com/ijcse.
19. H.B.Kekre, Sudeep D. Thepade,“Image Retrieval using Color-Texture Features
Extracted from Walshlet Pyramid”, ICGST International Journal on Graphics, Vision and
Image Processing (GVIP), Volume 10, Issue I, Feb.2010, pp.9-18, Available online
www.icgst.com/gvip/Volume10/Issue1/P1150938876.html
20. H.B.Kekre, Sudeep D. Thepade,“Color Based Image Retrieval using Amendment Block
Truncation Coding with YCbCrColor Space”, International Journal on Imaging (IJI),
Volume 2, Number A09, Autumn 2009, pp. 2-14. Available online at
www.ceser.res.in/iji.html (ISSN: 0974-0627).
21. H.B.Kekre, Tanuja Sarode, Sudeep D. Thepade,“Color-Texture Feature based Image
Retrieval using DCT applied on Kekre’s Median Codebook”, International Journal on
Imaging (IJI), Volume 2, Number A09, Autumn 2009,pp. 55-65. Available online at
www.ceser.res.in/iji.html (ISSN: 0974-0627).
22. H.B.Kekre, Sudeep D. Thepade, “Image Retrieval using Non-Involutional Orthogonal
Kekre’s Transform”, International Journal of Multidisciplinary Research and Advances in
Engineering (IJMRAE), Ascent Publication House, 2009, Volume 1, No.I, pp 189-203,
2009. Abstract available online at www.ascent-journals.com (ISSN: 0975-7074)
23. H.B.Kekre, Sudeep D. Thepade, “Boosting Block Truncation Coding using Kekre’s LUV
Color Space for Image Retrieval”, WASET International Journal of Electrical, Computer
and System Engineering (IJECSE), Volume 2, Number 3, pp. 172-180, Summer 2008.
Available online at http://www.waset.org/ijecse/v2/v2-3-23.pdf
24. H.B.Kekre, Sudeep D. Thepade, Archana Athawale, Anant Shah, Prathmesh Verlekar,
Suraj Shirke, “Performance Evaluation of Image Retrieval using Energy Compaction and
Image Tiling over DCT Row Mean and DCT Column Mean”, Springer-International
Conference on Contours of Computing Technology (Thinkquest-2010),
BabasahebGawde Institute of Technology, Mumbai, 13-14 March 2010, The paper will
be uploaded on online Springerlink.
25. H.B.Kekre, Tanuja K. Sarode, Sudeep D. Thepade, VaishaliSuryavanshi,“Improved
Texture Feature Based Image Retrieval using Kekre’s Fast Codebook Generation
Algorithm”, Springer-International Conference on Contours of Computing Technology
(Thinkquest-2010), BabasahebGawde Institute of Technology, Mumbai, 13-14 March
2010, The paper will be uploaded on online Springerlink.
26. H.B.Kekre, Tanuja K. Sarode, Sudeep D. Thepade, “Image Retrieval by Kekre’s
Transform Applied on Each Row of Walsh Transformed VQ Codebook”, (Invited), ACM-
International Conference and Workshop on Emerging Trends in Technology (ICWET
2010),Thakur College of Engg. And Tech., Mumbai, 26-27 Feb 2010, The paper is
invited at ICWET 2010. Also will be uploaded on online ACM Portal.
27. H.B.Kekre, Sudeep D. Thepade, AkshayMaloo, “Image Retrieval using Fractional
Coefficients of Transformed Image using DCT and Walsh Transform”, IJEST.
28. Haar, Alfred, “ZurTheorie der orthogonalenFunktionensysteme”. (German),
MathematischeAnnalen, volume 69, No. 3, 1910, pp. 331–371.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 156
Dr. H. B. Kekre, Sudeep D. Thepade, Akshay Maloo
29. Charles K. Chui, “An Introduction to Wavelets”, Academic Press, 1992, San Diego, ISBN
0585470901.
30. H. B. Kekre, Tanuja K. Sarode, V. A. Bharadi, A. Agrawal. R. Arora,, M. Nair,
“Performance Comparison of Full 2-D DCT, 2-D Walsh and 1-D Transform over Row
Mean and Column Mean for Iris Recognition” International Conference and Workshop on
Emerging Trends in Technology (ICWET 2010) – 26-27 February 2010, TCET, Mumbai,
India.
31. M.C. Padma,P. A. Vijaya, “Wavelet Packet Based Features for Automatic Script
Identification”, International Journal Of Image Processing (IJIP), CSC Journals, 2009,
Volume 4, Issue 1, Pg.53-65.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 157
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
M. K. Ghose mkghose@smu.edu.in
Department of CSE, SMIT, Rangpo, Sikkim, INDIA
Abstract
Keywords: Topographic map, Contour line, Tracing, Moore neighborhood, Digital Elevation Map (DEM)
1. INTRODUCTION
Topographic map is a type of map that provides detailed and graphical representation of natural
features on the ground. Topographic maps conventionally show topography, or land contours, by
means of contour lines. These maps usually show not only the contours, but also any significant
streams, other water bodies, forest covers, built-up areas or individual buildings (depending on
scale) and other features. These maps are taken as reference or base map for many Remote
Sensing and GIS based application for generating thematic maps like drainage maps, slope
maps, road maps, land cover maps etc. The important and distinct characteristic of these maps is
that the earth’s surface can be mapped using contour lines. Digitization or vectorization process
for generating contour map for a state like Sikkim where there is large variation of slope takes
tremendous amount of time and manpower. Many research works are currently being conducted
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 156
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
in this field to automate the entire digitization process. Till today, a fully automated digitization
process does not provide satisfactory result.
Contour lines are imaginary lines that join points of equal elevation on the earth’s surface with
reference to mean sea level or curves that connect contiguous points of the same altitude
(isohypse). These lines are depicted brown in color in topographic maps, and are smooth and
continuous curves with a width of three to four pixels. These lines runs almost parallel or they
may be taken as nonintersecting lines except in steep cliffs. However, along with contour line, the
topographic maps also contain text information overlaid on these lines. This makes the entire
automation of extracting and tracing contour lines from the contour maps more complex and
difficult.
Traditional method for vectorization of contour line involves mainly the following steps:
Scanning paper topographic maps using high resolution scanner.
Registration of one or more maps with reference to the nearest datum.
Mosaicing or stitching various topographic maps.
Vectorization of various contour lines manually using line tracing by rubber band method.
Feeding depth information for each contour line.
Generating digital elevation models (DEM) for 3D surface reconstruction.
Uses of computer and digital topographic maps have made the task simpler. Currently research is
being carried out on automatic extraction of contour lines from topographic maps that involves
following five main tasks.
Registration of topographic map.
Filtering for enhancing map.
Color segmentation for extracting contour lines.
Thinning and pruning the binary images.
Raster to vector conversion.
The proposed work suggests a method that efficiently extracts contour lines, performs tracing of
contour lines and prepares a database wherein user can feed the height value interactively. In
this paper, we have proposed a modified Moore’s Neighbor contour tracing algorithm to trace all
contours in the given topographic maps. The content of the paper is organized as follows. In
section II we have summarized the related work carried out in this area. In section III, we have
discussed contour extraction and thinning algorithm. In section IV, we have discussed the original
Moore’s Neighbor contour tracing algorithm, followed by Modified Moore’s Neighbor Algorithm in
section V. Result and discussion in section VI provides detail result for study area and
comparison of these two algorithms. Finally Conclusion and future scope is given in section VII.
2. RELATED WORK
Many researchers have indulged themselves to come up with a technique to completely automate
information extraction from topographic maps. Leberl and Olson [1] have suggested a method
that involves the entire four tasks mentioned above for automatic vectorization of clean contour
and drainage. Greenle [2] have made an attempt to extract elevation contour lines from
topographic maps. Soille and Arrighi [3] have suggested image based approach using
mathematical morphology operator to reconstruct contour lines. Most of these procedures fail at
discontinuities. Frischknecht [4] have used hierarchical template matching algorithm for extracting
text but fails to extract contour lines. Spinello [5] have used geometric properties to recognize the
contour line that is based on global topology. It uses Delaunay triangulation to thin and vectorize
contour line. Zhou and Zhen [6] have proposed deformable model and field flow orientation
method for extracting contour lines. Dongjun et.al [7] has suggested a method based on
Generalized Gradient Vector Flow (GGVF) snake model to extract contour lines. In this paper we
have extended the work of Dongjun et.al [7] to trace the contour lines more efficiently and
automatically using Modified Moore’s Neighbor tracing algorithm. It also prepares databases of
these contour lines to feed the elevation value interactively. Since the topology of contour lines
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 157
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
are well defined i.e. a set of non-intersecting closed lines, it makes the tracing of contour lines
simpler.
There exists many contour tracing algorithms - Square tracing, Moore neighbor, Radial sweep,
Theo Pavlidis’ tracing algorithms[8] etc. but each algorithm has its own pros and cons. Most of
these algorithms fail to trace the contour of a large class of patterns due to their special kind of
connectivity i.e. contour family of 8 connected patterns (that are not 4 connected). Disadvantage
of these algorithms are that they do not trace holes present in the pattern. Hole searching
algorithms are first used to extract holes and then tracing algorithms are applied to each hole in
order to trace the complete contour. Another problem with this algorithm is defining the stopping
criterion for terminating an algorithm.
Begin
Set B to be empty.
From bottom to top and left to right scan the cells of T until a pixel, s, of P is found.
Set the current pixel point, c, to s i.e. c = s.
While c is not in B do
If hue_range of c between 0 to 0.11 and saturation_range of c between 0.2 to 0.7
o Insert c in B.
End if
Advance c to the next pixel in P.
End while
End
The segmented information includes contours and altitude information. The filtered or segmented
image is then thinned using morphological thinning algorithm [9] given below.
In the first sub-iteration, delete pixel p from the first subfield if and only if the conditions
G1, G2, and G3 are all satisfied.
In the second sub-iteration, delete pixel p from the second subfield if and only if the
conditions G1, G2, and G3' are all satisfied.
Condition G1:
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 158
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
(1)
where
(2)
(3)
x1, x2, ..., x8 are the values of the eight neighbors of p, starting with the east neighbor and
numbered in counter-clockwise order.
Condition G2:
(4)
where
(5)
(6)
Condition G3:
(7)
Condition G3':
(8)
The processed image thus obtained contains broken contour lines, we have used broken contour
lines reconnection algorithm [7] based on GGVF to connect the gaps in contour lines.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 159
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
Output: A sequence B(b1, b2, …, bk) of boundary pixels i.e. the contour line. We define M(p) to
be the Moore neighborhood of pixel p, c denotes the current pixel under consideration i.e. c is in
M(p).
Begin
Set B to be empty.
From bottom to top and left to right scan the cells of T until a black pixel, s, of P is found.
Insert s in B.
Set the current boundary point, p, to s i.e. p = s.
Set c to be the next clockwise pixel in M(p).
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 160
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
While c is not in B do
If c is black
o Insert c in B.
o Set p=c.
End if
Advance c to the next clockwise pixel in M(p).
End while
Set B to be empty.
Insert s in B.
Set p=s.
Set c to the next anticlockwise pixel in M(p).
While c is not in B do
If c is black
o Insert c in B.
o Set p=c.
End if
Advance c to the next anticlockwise pixel in M(p).
End while
End
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 161
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
topographic map for the study area is on scale of 1:250000. Figure 3(a) is the topographic map of
the study area. Figure 3(b) is the result of applying color segmentation algorithm. Figure 3(c) is
the result of applying broken contour lines reconnection algorithm based on GGVF followed by
thinning. 3(d) is the result of Moore Neighbor tracing using Jacob stopping criterion, 3(e) is the
result of Modified Moore Neighbor tracing algorithm. Table 1 is the database prepared for the
contour map traced using proposed method.
The efficiency of any algorithm entirely depends on the choice of stopping criterion. Original
Moore Neighbor tracing algorithm using Jacob stopping criterion that needs N + (n-1) * (N-1)
pixels to be traversed, where n is the number of times that the start pixel is visited and N is the
number of black pixels that forms a contour line. The choice of scanning anticlockwise after we
move to the start pixel in our algorithm is to avoid detection of black pixels already encountered in
the clockwise scanning. Since we do not use backtracking, for every detection of black pixel,
there is a maximum overhead of checking 6 pixel locations (worst case) before finding a black
pixel. Using the Moore-neighbor algorithm, since the algorithm has to retrace the start pixel, there
is an overhead of redetection of each and every already traced pixel.
In Modified Moore Neighbor algorithm we have removed the dependency of reaching the start
pixel in order to stop the algorithm i.e. start pixel is no longer required as a landmark to indicate
the end of algorithm. The proposed algorithm does not require hole searching algorithm to detect
holes in the input pattern. The drawback of this algorithm however is consistent checking of
every pixel encountered in the Moore Neighbor to decide whether it has been encountered before
or not. For very large size images, checking pixels every time could be time consuming and
costly. Another disadvantage of the algorithm is that it works only on contour lines of single pixel
width. Hence the extracted contour map has to undergo thinning.
Figure3 a) Topographic map of the study area b) Contour Extraction using Color Segmentation c) Contour
reconstructed using broken contour lines reconnection algorithm [7] based on GGVF d) Result obtained
using Original Moore’s Neighbor tracing algorithms where holes are not detected e) Results obtained using
Modified Moore’s Neighbor tracing algorithms with detected holes.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 162
Ratika Pradhan, Shikhar Kumar, Ruchika Agarwal, Mohan P. Pradhan & M. K. Ghose
No. of contours: 42
Starting Point End Point
Serial No: x y X Y Elevation
1 15 635 16 471 4000
2 15 598 16 494 3600
3 15 562 16 515 3200
4 15 446 52 644 2800
5 15 433 108 646 2400
. . . . . .
. . . . . .
. . . . . .
8. ACKNOWLEDGMENT
We would like to thank All India Council for Technical Education (AICTE) for funding the project
title “Contour Mapping and 3D Surface Modeling of State Sikkim” fully sponsored by All India
Council of Technical Education, Govt. of India vide order no- 8023/BOR/RID/RPS-44/2008-09.
We also like to thank Dr. A. Jeyaram, Head, Regional Remote Sensing Service Centre (RRSSC),
IIT campus, Kharagpur for his valuable comments and support.
9. REFERENCES
[1] F. Leberl, D. Olson, “Raster scanning for operatioal digitizing of graphical data”,
Photogrammetric Engineering and Remote Sensing, 48(4), pp. 615-627,1982.
[2] D. Greenle, “Raster and Vector Processing for Scanned line work”, Photogrammetric and
Remote Sensing, 53(10), pp. 1383-1387, 1987.
[3] P. Soille, P Arrighi, “From Scanned Topographic Maps to Digital Elevation Models”, Proc. of
Geovision, International Symposium on Imaging Appications in Geology, pp.1-4,1999.
[4] S. Frischknecht, E. Kanani, “Automatic Interpretation of Scanned Topographic Maps: A
Raster – Based Approach”, Proc.Second International Workshop, GREC, pp.207-220, 1997.
[5] S. Salvatore, P. Guitton, “Contour Lines Recognition from Scanned Topographic Maps”,
Journal of WSCG, pp. 1-3, 2004.
[6] X. Z. Zhou, H. L. Zhen, “Automatic vectorization of comtour lines based on Deformable model
and Field Flow Orirntation”, Chiense Journal of Computers,vol 8, pp. 1056-1063, 2004.
[7] Dongjum Xin, X. Z. Zhou, H.L.Zhen, “Contour Line Extraction from Paper- based Topographic
Maps”.
[8] G. Toussaint, Course Notes: Grids, connectivity and contour Tracing
<http://jeff.cs.mcgill.ca/~godfried/teaching/pr-notes/contour.ps>.
[9] Lam, L., Seong-Whan Lee, and Ching Y. Suen, "Thinning Methodologies-A Comprehensive
Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 14, No. 9,
September 1992, page 879.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 163
Hiremath P S & Kodge B G
Hiremath P. S. hiremathps@hotmail.com
Department of Computer Science
Gulbarga University
Gulbarga- 585106, Karnataka State, INDIA
Kodge B. G. kodgebg@hotmail.com
Department of Computer Science
S. V. College
UDGIR – 413517, Maharashtra State, INDIA
Abstract
In the 21st century, Aerial and satellite images are information rich. They are also
complex to analyze. For GIS systems, many features require fast and reliable
extraction of open space area from high resolution satellite imagery. In this paper
we will study efficient and reliable automatic extraction algorithm to find out the
open space area from the high resolution urban satellite imagery. This automatic
extraction algorithm uses some filters and segmentations and grouping is
applying on satellite images. And the result images may use to calculate the total
available open space area and the built up area. It may also use to compare the
difference between present and past open space area using historical urban
satellite images of that same projection.
Keywords: Automatic open space extraction, Image segmentation, Feature extraction, Remote sensing
1. INTRODUCTION
Extraction of open space area from raster images is a very important part of GIS features such as
GIS updating, geo-referencing and geo spatial data integration. However extracting open space
area from raster image is a time consuming operation when performed manually, especially when
the image is complex. The automatic extraction of open space area is critical and essential to the
fast and effective processing of large number of raster images in various formats, complexities
and conditions.
How open space area are extracted properly from raster images depend on how open space area
appear in raster image. In this paper, we study automatic extraction of open space area from high
resolution urban area satellite image. A high resolution satellite image typically has a resolution of
0.5 to 1.0 m. Under such high resolution, an open space is not same any more in whole image,
instead, objects such as lake(s), trees are easily identifiable. This class of images contains very
rich information and when fused with vector map can provide a comprehensive view of a
geographical area. Google, Yahoo, and Virtual Earth maps are good examples to demonstrate
the power of such high resolution images. However, high resolution images pose great
challenges for automatic feature extraction due to the inherent complexities. First, a typical aerial
photo captures everything in the area such as buildings, cars, trees, open space area, etc.
Second, different objects are not isolated, but mixed and interfere with each other, e.g., the
shadows of trees on the road, building tops with similar materials. Third, roads may even look
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 164
Hiremath P S & Kodge B G
quite differently within the same image, due to their respective physical properties. Assuming all
open space area have the same characteristics will fail to extract total open space area. In
addition, the light and weather conditions have big impact over images. Therefore, it is impossible
to predict what and where objects are, and how they look like in a raster image. All these
uncertainties and complexities make the extraction very difficult. Due to its importance, much
effort has been devoted to this problem [3, 4]. Unfortunately, there are still no existing methods
that can deal with all these problems effectively and reliably. Some typical high resolution images
are shown in Figure 1 and they show huge difference among them in terms of the color spectrum
and noise level.
There are numerous factors that can distort the edges, including but not limited to blocking
objects such as trees and shadows, surrounding objects in similar colors such as roof tops. As a
matter of fact, the result of edge detection is as complicated as the image itself. Edges of open
space area are either missing or broken and straight edges correspond to buildings, as shown in
Figure 2. Therefore, edge-based extraction schemes will all fail to produce reliable results under
such circumstances.
FIGURE 1(B): High resolution urban satellite image. (Image of Latur city, dated 23 Feb. 2003)
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 165
Hiremath P S & Kodge B G
In this paper, we develop an integrated scheme for automatic extraction that exploits the inherent
nature of open space area. Instead of relying on the edges to detect open space, it tries to find
the pixels that belong to the same open area region based on how they are related visually and
geometrically. Studies have shown that the visual characteristic of a open space is heavily
influenced by its physical characteristics such as material, surface condition. It is impossible to
define a common pattern just based on color or spectrum to uniquely identify the open space
area. In our scheme, we consider a open space as a group of “similar” pixels. The similarity is
defined in the overall shape of the region they belong to, the spectrum they share, and the
geometric property of the region. Different from edge-based extraction schemes, the new scheme
first examines the visual and geometric properties of pixels using a new method. The pixels are
identified to represent each region. All the regions are verified based on the general visual and
geometric constraints associated with a open space area. Therefore, the roof top of a building or
a long strip of trees is not misidentified as a open space area segment. There is also no need to
assume or guess the color spectrum of open space area, which varies greatly from image to
image according to the example images.
As illustrated by examples in Figure 1, an open space area is not always a contiguous region of
regular linear shape, but a series of segments with no constant shapes. This is because each
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 166
Hiremath P S & Kodge B G
segment may include pixels of surrounding objects in similar colors or miss some of its pixels due
to interference of surrounding objects. A reliable extraction scheme must be able to deal with
such issues. In the following sections, we will discuss how to capture the essence of “similarity”,
translate them and finally turn them into display.
The first stage extracts open space areas that are relatively easier to identify such as major
grounds and the second stage deals with open space that are harder to identify. The reason for
such design is a balance between reliability and efficiency. Some open space areas are easier to
identify because they are more identifiable and contain relatively less noise. Since some open
space area in the same image share some common visual characteristics, the information from
the already extracted area and other objects, such as spectrum, can be used to simplify the
process of identifying open space area that are less visible or heavily impacted by surrounding
objects or by different colors. Otherwise, such areas are not easily distinguishable from patterns
formed by other objects. For example, a set of collinear blocks may correspond to a open space
or a group of buildings (houses) from the same block. The second stage also serves an important
purpose to fill the big gaps for open space extracted in stage one. Under some severe noise, part
of the area may be disqualified as valid open space region and hence missed in stage one,
leaving some major gaps in open space area edges. With the additional spectrum information,
these missed areas can be easily identified to complete the open space area extraction.
Therefore, the two stage process eliminates the need to assume or guess the color spectrum of
open space and allow them to extract much more complete.
Each major stage consists of the following three major steps: filtering, grouping and optimization,
as shown in Figure 3. The details of each step will be discussed in the following section.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 167
Hiremath P S & Kodge B G
3. ALGORITHM
In this paper, we assume images to satisfy the following two general assumptions. These two
assumptions are derived based on the minimum conditions for open space area to be identifiable
and therefore are easily met by most images.
• Visual constraint: majority of the pixels from the same open space area have similar
spectrum that is distinguishable from most of the surrounding areas;
• Geometric constraint: a open space is a region that is relatively has no standard shape,
compared with other objects in the image;
These two constraints are different from the assumption. The visual constraint does not require a
open space region to have single color or constant intensity. It only requires blank area to look
visually different from surrounding objects in most parts. The geometric constraint does not
require a smooth edge, only the overall shape to be a long narrow strip. So these conditions are
much weaker and a lot more practical. As we can see, these assumptions can accommodate all
the difficult issues very well, including blurring, broken or missing edge of open area boundaries,
heavy shadows, and interfering surrounding objects.
3.1 Filtering
The step of filtering is to identify the key pixels that will help determine if the region they belong to
is likely an open space area segment. Based on the assumption of visual constraint, it is possible
to establish an image segmentation using methods of edge detection. Notice that such separation
of regions is required to be precise and normally contains quite a lot of noise. In the best, the
boundaries between regions are a set of line segments for most images, as is the case shown in
Figure 2. It certainly does not tell which region corresponds to which area and which is not based
on the extracted edges. As a matter of fact, most of the regions are not completely separated by
edges and are still interconnected based on 4-connect path or 8-connect path. In order to fully
identify and separate open space regions from the rest of image, we proposed to invert them and
again extract the edges using Sobel edge detector to highlight sharp changes in intensity in the
active image or selection. Two 3x3 convolution kernels (show below) are used to generate
vertical and horizontal derivatives. The final image is produced by combining the two derivatives
using the square root of the sum of the squares.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 168
Hiremath P S & Kodge B G
1 2 1 1 0 -1
0 0 0 2 0 -2
-1 -2 -1 1 0 -1
The next step is a removal of outliers through we can replace a pixel by the median of the pixels
in the surrounding if it deviates from the median by more than a certain value (the threshold).
We used the following values for outlier removal.
3.2 Segmentation
The step of segmentation is to verify which region is a possible road region based on the central
pixels. Central pixels
contain not only the centerline information of a region, but also the information of its overall
geometric shape. For example, a perfect square will only have one central pixel at its center. A
long narrow strip region will have large number of central pixels. Only regions with ratios above
certain thresholds are considered to be candidate regions. In order to filter out interference as
much as possible for reliable extraction during the first major stage, a minimum region width can
be imposed. This will effectively remove most of the random objects from the image. However,
such width constraint will be removed during the second major step as improper regions can also
be filtered out based on the color spectrum information obtained from stage one. Therefore, small
regions with close spectrum are examined for possible small open space area in the second
stage.
In addition to the geometry constraint, some general visual information can also be applied to
filter out obvious non-open space regions. For example, if the image is a color image, most of the
tree and grass areas are greenish. Also tree areas usually contain much richer textures than
normal smooth road surfaces. The intensity transformation and spatial and frequency filtering can
be used to filter out such areas. The minimum assumption of the proposed scheme does not
exclude the use of additional visual information to further improve the quality of extraction if they
are available.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 169
Hiremath P S & Kodge B G
The purpose of this step is to group corresponding open space area segments together in order
to find the optimal results for required area extraction. if enough information is available to
determine the open space area spectrum, then optimization is better applied after all the
segments are identified. The fig. 5 is the result of threshold use to automatically or interactively
set lower=0 and upper=48 threshold values, segmenting the image into features of interest and
background. The threshold features are displayed in white and background is displayed in red
color i.e. the total open space area in the given projected satellite image.
FIGURE 6: Extracted open space area (Red color) using available historical images.
The extracted open space area from high resolution urban satellite imagery is shown in figure 7
with region wise numbers. The total available open space region’s labels, area and centroids are
calculated and shown in table 1.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 170
Hiremath P S & Kodge B G
Following table 1 showing the details of calculated area and centroids x1, y1 of labeled regions of
figure 7.
Centroids
Labels Area x1 y1
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 171
Hiremath P S & Kodge B G
Following table 2 showing the details of calculated area and centroids x1, y1 of labeled regions of
figure 8.
Centroids
Labels Area x1 y1
Following table 3 showing the details of calculated area and centroids x1, y1 of labeled regions of
figure 8.
Centroids
Labels Area x1 y1
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 172
Hiremath P S & Kodge B G
Based extracted areas from table 1, table 2, and table 3, the comparative study of open space area from
existing historical images of year 2003, 2006 and 2008 is demonstrated below in the graph.
40000
35000
30000
25000
2003
Area
20000 2006
2008
15000
10000
5000
0
1 2 3 4 5 6 7
Labeled regions
Figure 10. Comparative results of extracted open space areas of imageries 2003, 2006 and 2008
respectively.
4. CONCLUSION
In this paper, we proposed a new automatic system for extracting open space area and
intersections from high resolution aerial and satellite images. The main contribution of the
proposed system is to address the major issues that have caused all existing extraction
approaches to fail, such as blurring boundaries, interfering objects, inconsistent area profiles,
heavy shadows, etc. To address all these difficult issues, we develop a new method, namely
automatic extraction of open space area from high resolution satellite imagery, to capture the
essence of both visual and geometric characteristics of open space area. The extraction process
including filtering, segmentation, and grouping and optimization, these processes eliminates the
need to assume or guess the color spectrum of different open space areas. The proposed
approach is efficient, reliable, and assumes no prior knowledge about the required open space
area, conditions and surrounding objects. It is able to process complicated aerial/satellite images
from a variety of sources including aerial photos from Google and Yahoo online maps. The quick
application of the proposed study will helps to landing the helicopters in open space area.
5. REFERENCES
1. Y. Li, R. Briggs. Scalable and error tolerant automated georeferencing under affine
transformations. IEEE International Geoscience and Remote Sensing Symposium, Boston, MA,
July 2008.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 173
Hiremath P S & Kodge B G
2. Y. Li, R. Briggs. Automated georeferencing based on topological point pattern matching. The
International Symposium on Automated Cartography (AutoCarto), Vancouver, WA, June 2006.
3. J.B. Mena. State of the Art on Automatic Road Extraction for GIS Update: a Novel
Classification. Pattern Recognition Letters, 24(16):3037-3058, 2003.
4. M.-F. Auclair-Fortier, D. Ziou, C. Armenakis, and S. Wang. Survey of Work on Road Extraction
in Aerial and Satellite Images. Technical Report 241, Département de mathématiques et
d’informatique, Université de Sherbrooke, 1999.
5. G. Vosselman, J.D. Knecht. Road Tracking by Profile Matching and Kalman Filtering.
Workshop on Automatic Extraction of Man-Made Objects from Aerial and Space Images, pages
265-274, 1995.
6. D.M. Mckeown, J.L. Denlinger. Cooperative Methods for Road Tracking in Aerial Imagery.
Workshop Computer Vision Pattern Recognition, pages 662-672, 1988.
7. A. Gruen, H. Li. Semi-automatic Linear Feature Extraction by Dynamic Programming and LSB-
Snakes. Photogrammet Eng. Remote Sensing 63, pages 985-995, 1997.
8. S.B. Hinz, A. Ebner. Modeling Contextual Knowledge for Controlling Road Extraction in Urban
Areas. IEEE/ISPRS Joint Workshop Remote Sensing Data Fusion over Urban Areas, 2001.
10. A. Baumgartner, C.T. Steger, H. Mayer, and W. Eckstein. Multi- Resolution, Semantic
Objects, and Context for Road Extraction. In Wolfgang Förstner and Lutz Plümer, editors,
Semantic Modeling for the Acquisition of Topographic Information from Images and Maps, pages
140–156, Basel, Switzerland, 1997. Birkhäuser Verlag.
11. A. Baumgartner, C.T. Steger, C. Wiedemann, H. Mayer, W. Eckstein, and H. Ebner. Update
of Roads in GIS from Aerial Imagery: Verification and Multi-Resolution Extraction. Proceedings of
International Archives of Photogrammetry and Remote Sensing, XXXI B3/III:53–58, 1996.
International Journal of Image Processing (IJIP) Volume (4): Issue (2) 174
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
Abstract
In most of our official papers, school text books, it is observed that English words
interspersed within the Indian languages. So there is need for an Optical
Character Recognition (OCR) system which can recognize these bilingual
documents and store it for future use. In this paper we present an OCR system
developed for the recognition of Indian language i.e. Oriya and Roman scripts for
printed documents. For such purpose, it is necessary to separate different scripts
before feeding them to their individual OCR system. Firstly, we need to correct
the skew followed by segmentation. Here we propose the script differentiation
line-wise. We emphasize on Upper and lower matras associated with Oriya and
absent in English. We have used horizontal histogram for line distinction
belonging to different script. After separation different scripts are sent to their
individual recognition engines.
Keywords: Script separation, Indian script, Bilingual (English-Oriya) OCR, Horizontal profiles
1. INTRODUCTION
Researchers have been emphasizing a lot of effort for pattern recognition since decades.
Amongst the pattern recognition field Optical Character Recognition is the oldest sub field and
has almost achieved a lot of success in the case of recognition of Monolingual Scripts. . In India,
there are 24 official (Indian constitution accepted) languages. Two or more of these languages
may be written in one script. Twelve different scripts are used for writing these languages. Under
the three-language formula, some of the Indian documents are written in three languages namely,
English, Hindi and the state official language. One of the important tasks in machine learning is
the electronic reading of documents. All official documents, magazines and reports can be
converted to electronic form using a high performance Optical Character Recognizer (OCR). In
the Indian scenario, documents are often bilingual or multi-lingual in nature. English, being the
link language in India, is used in most of the important official documents, reports, magazines and
technical papers in addition to an Indian language. Monolingual OCRs fail in such contexts and
there is a need to extend the operation of current monolingual systems to bilingual ones. This
paper describes one such system, which handles both Oriya and Roman script. Recognition of
bilingual documents can be approached by the following method i.e. Recognition via script
identification. Optical Character Recognition (OCR) system of such a document page can be
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 175
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
made through the Development of a script separation scheme to identify different scripts present
in the document pages and then run individual OCR developed for each script alphabets.
Development of a generalized OCR system for Indian languages is more difficult than a single
script OCR development. This is because of the large number of characters in each Indian script
alphabet. On the other hand, second option is simpler for a country like India because of many
scripts. There are many pieces of work on script identification from a single document. Spitz [1]
developed a method to separate Han-based or Latin- based script separation. He used optical
density distribution of characters and frequently occurring word shape characteristics for the
purpose. Recently, using fractal-based texture features, Tan [5] described an automatic method
for identification of Chinese, English, Greek, Russian, Malayalam and Persian text. Ding et al. [3]
proposed a method for separating two classes of scripts: European (comprising Roman and
Cyrillic scripts) and Oriental (comprising Chinese, Japanese and Korean scripts). Dhanya and
Ramakrishnan [9] proposed a Gabor filter based technique for word-wise segmentation from bi-
lingual documents containing English and Tamil scripts. Using cluster based templates; an
automatic script identification technique has been described by Hochberg et al. [4]. Wood et al.
[2] described an approach using filtered pixel projection profiles for script separation. Pal and
Chaudhuri [6] proposed a line-wise script identification scheme from tri-language (triplet)
documents. Later, Pal et al. [7] proposed a generalized scheme for line-wise script identification
from a single document containing all the twelve Indian scripts. Pal et al. [8] also proposed some
work on word-wise identification from Indian script documents.
All the above pieces of work are done for script separation from printed documents. In the
proposed scheme, at first, the documents noise is cleared which we perform at the binarization
stage and then the skew is detected and corrected. Using horizontal projection profile the
document is segmented into lines. The line height for individual script is different. Along with this
property one more uniqueness property in between the Roman and Oriya script is that each line
consists of more number of Roman characters as compared to that of Oriya. Basing on these
features we have taken a threshold value by dividing the line height of each line with the number
of characters in a line. And after obtaining a unique value we sent these lines to their respective
classifiers. The classifier which we have used is the Support Vector Machine. The Figure 1. below
shows the entire process carried out for the recognition of our bilingual document.
In section 2 we have described the properties of the Oriya Script. Section 3 covers a brief
description on binarization and skew correction. Section 4 gives a description on segmentation. In
Section 5 we have described the major portion of our work which focuses on Script identification.
Section 6 gives an analysis on the further cases that we have studied for bilingual script
differentiation. Section 7 describes on Feature extraction part which has been achieved through
Support Vector Machines. Section 8 discusses on the result that we have obtained.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 176
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
From the above Figure it can be noted that out of 52 basic characters 37 characters have a
convex shape at the upper part. The writing style in the script is from left to right. The concept of
upper/lower case is absent in Oriya script. A consonant or vowel following a consonant
sometimes takes a compound orthographic shape, which we call as compound character or
conjuncts. Compound characters can be combinations of consonant and consonant, as well as
consonant and vowel.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 177
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
Binarization
The input of an OCR is given in from the scanner or a camera. After this we need to binarize the
image. The image enhancement is followed using the spatial domain method that refers to the
aggregate of pixels composing an image. Spatial domain processes is denoted by the expression
O(x, y) =T [I(x, y)] where I(x, y) is the input image, O(x, y) is the processed image and T is an
operator on I. The operator T is applied at each location (x, y) to yield the output. The effect of
this transformation would be to produce an image of higher contrast than the original by
darkening the levels below 'm' and brightening the levels above m in the original image. Here 'm'
is the threshold value taken by us for brightening and darkening the original image. T(r) produces
a two-level (binary) image[10].
Skew Correction
Detecting the skew of a document image and correcting it are important issues in realizing a
practical document reader. For skew correction we have implemented Baird Algorithm. It’s a
horizontal profiling based algorithm. For skew detection, the horizontal profiles are computed
close to the expected orientations. For each angle a measure is made of variation in the bin
heights along the profile and the one with the maximum variation gives the Skew angle.
4. SEGMENTATION
Several approaches has also been taken for segmentation of the script line wise word wise and
character wise. A new algorithm for Segmentation of Handwritten Text in Gurmukhi Script has
been done by Sharma and Singh [11]. A new intelligent segmentation technique for functional
Magnetic Resonance Imaging (fMRI) been implemented using an Echostate Neural Network
(ESN) by D. Suganthi and Dr. S. Purushothaman [12]. A Simple Segmentation Approach for
Unconstrained Cursive Handwritten Words in Conjunction with the Neural Network has been
performed by Khan and Muhammad [13]. The major challenge in our work is the separation of
lines for script identification. The result of line segmentation which has been shown later, takes
into consideration the upper and lower matras of the line. And this gives the differences in the line
height for the distinction of the script. One more factor which we have considered for line
identification of different script is the horizontal projection profiles which look into the intensity of
pixels in different zones. Horizontal projection profile is the sum of black pixels along every row of
the image. For both of the above methods we have discussed the output in script identification
part and here we have discussed the concepts only.
The purpose of analyzing the text line detection of an image is to identify the physical region in
the image and their characteristics. A maximal region in an image is the maximal homogenous
area of the image. The property of homogeneity in the case of text image refers to the type of
region, such as text block, graphic, text line, word, etc. so we define the segmentation as follows
A segmentation of a text line image is a set of mutually exclusive and collectively exhaustive sub
regions of the text line image. Given an text line image, I, a segmentation is defined as
S= ,such that,
,and
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 178
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
∩ =ϕ i j.
Typical top-down approaches proceed by dividing a text image into smaller regions using the
horizontal and vertical projection profiles. The X-Y Cut algorithm, starts dividing a text image into
sections based on valleys in their projection profiles. The algorithm repeatedly partitions the
image by alternately projecting the regions of the current segmentation on the horizontal and
vertical axes. An image is recursively split horizontally and vertically until a final criterion where a
split is impossible is met. Projection profile based techniques are extremely sensitive to the skew
of the image. Hence extreme care has to be taken while scanning of images or a reliable skew
correction algorithm has to be applied before the segmentation process.
5. SCRIPT IDENTIFICATION
In a script, a text line may be partitioned into three zones. The upper-zone denotes the portion
above the mean-line, the middle zone (busy-zone) covers the portion of basic (and compound)
characters below mean-line and the lower-zone is the portion below base-line. Thus we can
define that an imaginary line, where most of the uppermost (lowermost) points of characters of a
text line lie, is referred as mean-line (base-line). Example of zoning is shown in Figure. 3. And
Figure 4a and b show a word each of Oriya and English, and their corresponding projection
profiles respectively. Here mean-line along with base-line partitions the text line into three zones.
For example from the Figure 4 shown below we can observe that the percentage of pixels in the
lower zone in case of Oriya characters is more in comparison to English characters.
In this approach, script identification is first performed at the line level and this knowledge is used
to identify the OCR to be employed. Individual OCRs have been developed for Oriya [14] as well
as English and these could be used for further processing. Such an approach allows the Roman
and Oriya characters to be handled independently from each other. In most Indian languages, a
text line may be partitioned into three zones. We call the uppermost and lowermost boundary
lines of a text line as upper-line and lower-line.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 179
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
FIGURE 4: The Three Zones of (a) Oriya Word and (b) English Word.
For script recognition, features are identified based on the following observations. From the
above projection profile we can observe that
1. The number of Oriya characters present in a line are comparatively less than that of the
Roman characters
2. All the upper case letters in Roman script extend into the upper zone and middle zone while
the lower case letters occupy the middle, lower and upper zones.
3. The Roman scripts has very few downward extensions(only for g, p, q, j, and y) and have low
range of the pixel density, whereas most of the Oriya line contains lower matras and have a
high range of pixel density.
4. Few Roman scripts(taking into consideration the lower case letters) has very less upward
extensions(only for b, d, f, h, l, and t) and have low range of the pixel density, whereas most
of the Oriya line contains upper vowel markers (matras) and have a high range of pixel
density
5. The upper portion of most of the Oriya script is convex in nature and touches the mean-line
and the Roman script is dominated by vertical and slant strokes.
In consideration to the above features for distinction we have tried to separate the scripts on the
basis of the line height. Figure 5 shows the different lines extracted for the individual scripts. Here
we have considered the upper and lower matras for the Oriya characters. We have observed that,
considering a certain threshold value for the line height, document containing English lines have a
line height less than the threshold value and the Oriya lines have a value that is greater than the
threshold value
.
FIGURE 5: Shown Above is the Line with Their Upper and Lower Matras.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 180
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
For each of the line shown above, the number of characters present in each line has been
calculated. Then a threshold value 'R' for both the scripts has been calculated by dividing the line
height of each line by the number of characters present in the line. Thus, R can be written as
The values that we obtained has been shown in Table 2.From this values we can see that for
Oriya script the value lies above 3.0 and for Roman it is below 3.0. So basing on these values the
script has been separated.
TABLE 2: The Ratio Obtained after Dividing Line Height with Number of Characters.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 181
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
We have taken nearly fifteen hundred printed documents for having a comparison in between the
output and deriving a conclusion. The above table and figure are represented for one of the
document while carrying out our experiment.
6. FEATURE EXTRACTION
The two essential sub-stages of recognition phase are feature extraction and classification. The
feature extraction stage analyzes a text segment and selects a set of features that can be used to
uniquely identify the text segment. The derived features are then used as input to the character
classifier. The classification stage is the main decision making stage of an OCR system and uses
the extracted feature as input to identify the text segment according to the preset rules.
Performance of the system largely depends upon the type of the classifier used. Classification is
usually accomplished by comparing the feature vectors corresponding to the input text/character
with the representatives of each character class, using a distance metric. The classifier which has
been used by our system is Support Vector Machine (SVM).
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 182
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
D( ) ≥ 1 - ξi for i = 1 ,….., M.
Here ξi are nonnegative slack variables. The distance between the separating hyper plane D(x) =
0 and the training datum, with ξi = 0, nearest to the hyper plane is called margin. The hyper plane
D(x) = 0 with the maximum margin is called optimal separating hyper plane. To determine the
optimal separating hyper plane, we minimize
where C is the margin parameter that determines the tradeoff between the maximization of the
margin and minimization of the classification error. The data that satisfy the equality in (4) are
called support vectors.
To enhance separability, the input space is mapped into the high-dimensional dot-product space
called feature space. Let the mapping function be g(x). If the dot
product in the feature space is expressed by H(x, x_) =g(x)tg(x), H(x, x_) is called kernel function,
and we do not need to explicitly treat the feature space. The kernel functions used in this study
are as follows:
H =
2. Polynomial kernels
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 183
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
H =
where d is an integer.
Let the decision function for class i against class j, with the maximum margin, be
Dij = +
where wij is an m-dimensional vector, bij is a scalar, and Dij (x) = −Dji (x). The
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 184
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
where
sign(x) =
If x ε Ri , Di (x) = n − 1 and Dk (x) < n − 1. Thus x is classified into class i. But if any of Di (x) is not
n, (12) may be satisfied for plural i’s. In this case, x is unclassifiable. If the decision functions for a
three- class problem are as shown in Figure 6, the shaded region is unclassifiable since Di (x) = 1
(i = 1, 2, and 3).
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 185
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
Figure 7, shows that x does not belong to class i. As the top-level classification, we can choose
any pair of classes. And except for the leaf node if Dij (x) > 0, we consider that x does not belong
to class j, and if Dij (x) < 0 not class i. Then if D12 (x) > 0, x does not belong to Class 2. Thus it
belongs to either Class 1 or 3 and the next classification pair is Classes 1 and 3. The
generalization regions become as shown in Figure 8. Unclassifiable regions are resolved but
clearly the generalization regions depend on the tree structure.
Classification by a DDAG is executed by list processing. In list processing, first we generate a list
with class numbers as elements. Then, we calculate the decision function, for the input x,
corresponding to the first and the last elements. Let these classes be i and j and Dij (x) > 0. We
delete the element j from the list. We repeat the above procedure until one element is left. Then
we classify x into the class that corresponds to the element number. For Fig. 7, we generate the
list {1, 3, and 2}. If D12 (x) > 0, we delete element 2 from the list; we obtain {1, 3}. Then if D13 (x) >
0, we delete element 3 from the list. Since only 1 is left in the list, we classify x into Class 1.
Training of a DDAG is the same with conventional pair wise support vector machines. Namely,
we need to determine n (n − 1)/2 decision functions for an n-class problem. The advantage of
DDAGs is that classification is faster than conventional pair wise support vector machines or pair
wise fuzzy support vector machines. In a DDAG, classification can be done by calculating (n − 1)
decision functions [24].
We have made use of DDAG support vector machine for the recognition of our OCR engine. And
below we show types of samples used for training and testing and the accuracy rate which we
have obtained for training characters.
7. RESULTS
A corpus for Oriya OCR consisting of data base for machine printed Oriya characters has been
developed. Collection of different samples for both the scripts has been done. Mainly samples
have been gathered from laser print documents, books and news papers containing variable font
style and sizes. A scanning resolution of 300 dpi is employed for digitization of all the documents.
Figure 9 and Figure 10 shows some sample characters of various fonts of both Oriya and Roman
script used in the experiment.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 186
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
We have performed experiments with different types of images such as normal, bold, thin, small,
big, etc. having varied sizes of Oriya and Roman characters. The training and testing set
comprises of more than 10, 000 samples. We have considered gray scale images for collection of
the samples. This database can be utilized for the purpose of document analysis, recognition,
and examination. The training set consists of binary images of 297 Oriya letters and 52 English
alphabets including both the lower and upper case letters. We have kept the same data file for
testing and training for all types of different classifiers to analyze the result. In most of the
documents the occurrence of Roman characters is very few as compared to that of Oriya
characters. For this reason, for training purpose we have collected more samples for Oriya
characters than that of English.
FIGURE 10: Samples of Machine Printed Roman Characters Used For Training.
Table below shows the effect on accuracy by considering different character sizes with different
types of the images used for Oriya characters.
TABLE 2: Effect on Accuracy by Considering Different Character Sizes with Different Types of the Images
used for Oriya Characters.
Table 3 below shows the recognition accuracy for Roman characters with normal, large and bold
and small fonts and it is observed that large sizes give better accuracy as compared to the other
fonts.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 187
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
Large 92.13%
normal 87.78%
TABLE 3: Recognition Accuracy for Roman Characters with Different Font Styles
Regarding the effect on accuracy by considering the different character sizes with different types
of the images used for characters, for Oriya-Bold and big characters the accuracy rate is high and
it is nearly 99.8 percentage of accuracy. The accuracy rate decreases for the thin and small size
characters.
Figure 11 shows an example of a typical bilingual document used in our work. It can be seen that
as per our discussion in the script identification section, all most all of the Oriya lines are
associated with lower and upper matras.
Thus finally after the different scripts being sent to the respective classifiers the final result that
we got is shown below in Figure 12. The Figure 11 shows one of the images we have taken for
testing. The upper portion of the image contains the English scripts and the lower half of the
image consists of the Oriya script.
For the image shown above the corresponding output is shown below in Figure 12.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 188
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
8. CONCLUSION
A novel method to script separation has been taken care of in this paper. In this work we have
tried to distinguish between the English and Oriya documents through horizontal projection
profiles for intensity of pixels in different zone along with the line height and the number of
characters present in that line. Separation of the scripts is preferred because training both the
scripts in a single recognition system decreases the accuracy rate. There is a probability of some
Oriya characters to get confused with some Roman characters and similar problem can be faced
during the period of post processing. We have tried to recognize the Oriya scripts and Roman
script with two separate training set using Support Vector Machines. And finally those recognized
characters are inserted into a single editor. Improved accuracy is always desired, and we are
trying to achieve it by improving each and every processing task: preprocessing, feature
extraction, sample generation, classifier design, multiple classifier combination, etc. Selection of
features and designing of classifiers jointly also lead to better classification performance. Multiple
classifiers is being tried to be applied to increase overall accuracy of the OCR system as it is
difficult to optimize the performance using single classifier at a time with a larger feature vector
set. The present OCR system deals with clean machine printed text with minimum noise. And the
input texts are printed in a non italic and non decorative regular font in standard font size. This
work in future can be extended for the development of bilingual OCR dealing with degraded,
noisy machine printed and italic text. This research work can also be extended to the handwritten
text. A postprocessor for both the scripts can also be developed to increase the overall accuracy.
So in consideration to the above problems, steps are being taken for more refinement of our
bilingual OCR.
ACKNOWLEDGEMENT
We are thankful to DIT, MCIT for its support and my colleague Mr. Tarun Kumar Behera for his
cooperation
9. REFERENCES
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 189
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
1. A. L. Spitz. “Determination of the Script and Language Content of Document Images”. IEEE
Trans. on PAMI, 235-245, 1997
5. T. N. Tan. “Rotation Invariant Texture Features and their use in Automatic Script
Identification”. IEEE Trans. On PAMI, 751-756, 1998
6. S. Wood, X. Yao, and K. Krishnamurthi, , L. Dang. “Language Identification for Printed Text
Independent of Segmentation”. In Proc. Int’l Conf. on Image Processing. 428-431, 1995
7. U. Pal, and B. B Chaudhuri,. “Script Line Separation from Indian Multi-Script Documents”.
IETE Journal of Research, 49, 3-11, 2003
9. S. Chanda, U. Pal, “English, Devnagari and Urdu Text Identification”. Proc. International
Conference on Cognition and Recognition, 538-545, 2005
10. S. Mohanty, H. N. Das Bebartta, and T.K . Behera. “An Efficient Blingual Optical Character
Recognition (English-Oriya) System for Printed Documents”. Seventh International
Conference on Advances in Pattern Recognition, ICAPR. 398-401, 2009
12. D. Suganthi, Dr. S. Purushothaman, “fMRI Segmentation Using Echo State Neural Network”.
Computers & Security, 2(1):1-9, 2009
14. S. Mohanty, and H. K. Behera.” A complete OCR Development System for Oriya Script”.
Proceedings of SIMPLE’ 04, IIT Kharagpur, 2004
16. V. N. Vapnik. “The Nature of Statistical LearningTheory”. Springer-Verlag, London, UK, 1995.
17. V. N. Vapnik. “Statistical Learning Theory”. John Wiley & Sons, New York, 1998.
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 190
Sanghamitra Mohanty & Himadri Nandini Das Bebartta
19. U. H.-G. Kreßel. “Pair wise classification and support vector machines”. In B. Sch¨olkopf, C.
J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods: Support Vector
Learning, pages 255– 268. The MIT Press, Cambridge, MA, 1999
20. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. “Large margin DAGs for multiclass
classification”. In S. A. Solla, T. K. Leen, and K.-R. M¨uller, editors, Advances in Neural
Information Processing Systems12, pages 547–553. The MIT Press, Cambridge, MA, 2000
21. B. Kijsirikul and N. Ussivakul. “Multiclass support vector machines using adaptive directed
acyclic Graph”. In Proceedings of International Joint Conference on Neural Networks (IJCNN
2002), 980–985, 2002
22. S. Abe and T. Inoue. “Fuzzy support vector machines for multiclass problems”. In
Proceedings of the Tenth European Symposium on Artificial Neural Networks (ESANN”2002),
116–118, Bruges, Belgium, 2002
23. K. P. Bennett. Combining support vector and mathematical programming methods for
classification. In B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel
Methods: Support Vector Learning, pages 307–326. The MIT Press, Cambridge, MA, 1999
24. J. Weston and C. Watkins. Support vector machinesfor multi-class pattern recognition. In
Proceedings of the Seventh European Symposium on Artificial Neural Networks (ESANN’99),
pages 219–224, 1999
25. F. Takahashi and S. Abe. “Optimizing Directed Acyclic Graph Support vector Machines”.
ANNPR , Florence (Italy), September 2003
International Journal of Image Processing (IJIP), Volume (4): Issue (2) 191
CALL FOR PAPERS
About IJIP
The International Journal of Image Processing (IJIP) aims to be an effective
forum for interchange of high quality theoretical and applied research in the
Image Processing domain from basic research to application development. It
emphasizes on efficient and effective image technologies, and provides a
central forum for a deeper understanding in the discipline by encouraging the
quantitative comparison and performance evaluation of the emerging
components of image processing.
Volume: 4
Issue: 3
Paper Submission: May 2010
Author Notification: June 30 2010
Issue Publication: July 2010
CALL FOR EDITORS/REVIEWERS
BRANCH OFFICE 1
Suite 5.04 Level 5, 365 Little Collins Street,
MELBOURNE 3000, Victoria, AUSTRALIA
BRANCH OFFICE 2
Office no. 8, Saad Arcad, DHA Main Bulevard
Lahore, PAKISTAN
EMAIL SUPPORT
Head CSC Press: coordinator@cscjournals.org
CSC Press: cscpress@cscjournals.org
Info: info@cscjournals.org