Chapter

First Edition 2007
DZULKIFLI MOHAMAD & NUR ZURAIFAH SYAZRAH OTHMAN 2007
Hak cipta terpelihara. Tiada dibenarkan mengeluar ulang mana-mana bahagian artikel, ilustrasi, dan isi kandungan buku ini dalam apa juga bentuk dan cara apa jua sama ada dengan cara elektronik, fotokopi, mekanik, atau cara lain sebelum mendapat izin bertulis daripada Timbalan Naib Canselor (Penyelidikan dan Inovasi), Universiti Teknologi Malaysia, 81310 Skudai, Johor Darul Tazim, Malaysia. Perundingan tertakluk kepada perkiraan royalti atau honorarium. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical including photocopy, recording, or any information storage and retrieval system, without permission in writing from Universiti Teknologi Malaysia, 81310 Skudai, Johor Darul Tazim, Malaysia. Perpustakaan Negara Malaysia Cataloguing-in-Publication Data
Advances in image processing and pattern recognition : algorithms & practice / edited by: Dzulkifli Mohamad, Nur Zuraifah Syazrah Othman. ISBN 978-983-52-0621-4 1. Image processing. 2. Pattern recognition systems. I. Dzulkifli Mohamad, 1957-. II. Nur Zuraifah Syazrah Othman. 621.399 Editor: Dzulkifli Mohamad & Rakan Pereka Kulit: Mohd Nazir Md. Basri & Mohd Asmawidin Bidin Diatur huruf oleh / Typeset by
Fakulti Sains Komputer & Sistem Maklumat
Diterbitkan di Malaysia oleh / Published in Malaysia by

PENERBIT UNIVERSITI TEKNOLOGI MALAYSIA
34 38, Jln. Kebudayaan 1,Taman Universiti 81300 Skudai, Johor Darul Tazim, MALAYSIA. (PENERBIT UTM anggota PERSATUAN PENERBIT BUKU MALAYSIA/ MALAYSIAN BOOK PUBLISHERS ASSOCIATION dengan no. keahlian 9101) Dicetak di Malaysia oleh / Printed in Malaysia by
UNIVISION PRESS SDN. BHD
Lot. 47 & 48, Jalan SR 1/9, Seksyen 9, Jalan Serdang Raya, Taman Serdang Raya, 43300 Seri Kembangan, Selangor Darul Ehsan, MALAYSIA.
CONTENTS
Preface
vii
Chapter 1
Pre-Processing Techniques For Offline Cursive Handwriting Recognition: Recent Advances Amjad Rehman and Dzulkifli Mohamad
Chapter 2
Segmentation-based Offline Cursive Handwriting Recognition: Current Updates and Recent Advances 51 Amjad Rehman and Dzulkifli Mohamad Hidden Markov Models (HMMs) Approach for Handwriting Recognition 113 Muhammad Faisal Zafar and Dzulkifli Mohamad Segmentation of Brain MR Images 141 M. Masroor Ahmed and Dzulkifli Mohamad
Chapter 3
Chapter 4
vi
Contents
Chapter 5
Tissues Segmentation of Brain MR Images by Utilizing Artificial Neural Network 155 M. Masroor Ahmed and Dzulkifli Mohamad Automatic Segmentation of White Matter Lesion MRI Brain 165 Novanto Yudistira and Daud Daman Fingerprint Image Reconstruction and Enhancement 175 Ghazali Sulong, Mohamad Kharulli Othman and Khairul Azlan Ali Fingerprint Singularity and Core Point Detection Fadzilah Ahmad and Dzulkifli Mohamad Review of Length Estimators in 2D Oldooz Dianat and Habibollah Haron
Chapter 6
Chapter 7
Chapter 8
195
Chapter 9
211
Chapter 10
The Characteristics and Mapping Algorithm of Rectangular Vertex Chain Code 235 Lily Ayu Wulandhari and Habibollah Haron Geometric Reasoning Algorithm in Extraction and Recognition of 3D Object Features 255 Zuraini Sukimin and Habibollah Haron
Chapter 11
INDEX
267
PREFACE
This book reflects the progress made in the area of image processing and pattern recognition. Image processing and pattern recognition have been fast growing fields for the last forty years. The advancement in computer technology in terms of hardware, software, and input peripherals has made great impacts to the progress of this field. A number of pattern recognition technologies have been developed and several of them are being used in numerous applications ranging from document imaging, medical imaging, remote sensing, biometric recognition, multimedia computing, content-based image retrieval, and so on. As a result, it has become extremely important to provide the state-of-the-art of research works in this field. Basically, image processing and pattern recognition refer to the processing of digital images for two principal interests: improvement of pictorial information for human understanding and interpretation; and processing of image data for storage, transmission, representation for computer perception. This book focuses on basically three main areas, i.e. document imaging, feature extraction techniques, and pattern
viii
Preface
recognition algorithms. This book will be useful to researchers, lecturers, and students who wish to understand and to develop numerous applications related to computer vision systems. Dzulkifli Mohamad Nur Zuraifah Syazrah Othman Department of Computer Graphics and Multimedia Faculty of Computer Science and Information System Universiti Teknologi Malaysia 2007
1
PRE-PROCESSING TECHNIQUES FOR OFFLINE CURSIVE HANDWRITING RECOGNITION: RECENT ADVANCES
Amjad Rehman Dzulkifli Mohamad
INTRODUCTION
The term "handwriting" is defined as meaning a surface consisting of artificial graphic marks conveying some message through the marks conventional relation to language (Plamondon and Srihari, 2000). Handwriting is one of the most important ways in which civilized people communicate. It is used both for personal (letters, notes, addresses on envelopes etc), official communications (bank checks, tax form, postal services, admission forms etc) and for communications written to ourselves (reminders, lists, diaries etc). Extensive research has been carried out in terms of technical research papers and reports by various researchers around the globe. Despite, intensive research efforts of decades, still there are no commercial solutions to deal with totally unconstrained cursive handwriting recognition on static surface such as bank checks, postal envelopes and paper-based forms (Gatos et al., 2007). The literature is replete with encouraging recognition results by the research community and research in this
Advances in Image Processing and Pattern Recognition: Algorithms and Practice
area is continually mounting towards its maturity. A number of research papers are published, addressing different problems associated with the process of off-line cursive handwriting recognition such as character recognition, character segmentation, word segmentation, word recognition (Hamamura et al., 2007; Bishnu and Chaudhuri, 2007; Cheriet et al., 2007; Kapp et al.; 2007; Gatos et al., 2007; Koerich et al., 2006; Gatos et al.,2006a; Gnter and Bunke, 2005; Chevalier et al., 2005; Liu and Fujisawa, 2005; Marinai et al.,2005; Lee and Coelho, 2005; Suen and Tan, 2005; Schambach, 2005; Chevalier et al., 2005; Britto, et al., 2004; Blumenstein et al., 2004; Gnter and Bunke,2004; Blumenstein et al., 2003; Hanmandlu et al, 2003;Camastra and Vinciarelli, 2003; Arica and Yarman-Vural, 2002; Fan and Verma, 2002; Gang et al., 2002; Arika et al., 2002; Blumenstein and Verma, 2001; Plamondon and Srihari, 2000; Blumenstein and Verma, 1999a, 1999b; Chiang, 1998; Dimauro et al.,1998; Cho, 1997; Eastwood et al., 1997; Gader et al., 1997). Additionally, several survey papers are published to provide an update development in this domain (Koerich et al., 2003; Zhang and Lu, 2004; Vinciarelli, 2002; Plamondon and Srihari, 2000; Steinherz et al, 1999; Hull, 1998; Lu and Shridhar, 1996; Casey and Lecolinet, 1996; Dunn and Wang, 1992). Moreover, international conferences on frontiers in handwriting recognition and international journal on document analysis and recognition are continually updating with new findings. Although some researchers presented very encouraging results for isolated alphabets and digits recognition, however, the same accuracy rate is not attainable in cursive handwriting recognition. It is mainly due to the inherited problems in cursive handwriting such as touching, overlapped, broken, incomplete and ambiguous characters in cursive handwriting which is the main cause of segmentation errors as shown in figure 1.
Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances
Figure 1
Touched, overlapped, broken characters in cursive handwriting samples from IAM database
This chapter reports on the state of the art in off-line cursive handwriting recognition research and its associated issues.
Types of Handwriting Input: On-line versus off-line The field of handwriting recognition can be classified in several ways. However, the clearest one is online (dynamic) and off-line (static) handwriting recognition strategies. The first one takes information on the time order and dynamics of the writing process that is captured by the writing device such as Personal Digital Assistants (PDAs). The temporal information is an additional source of knowledge that helps to increase the recognition accuracy. On the other hand, offline handwriting input relies on more sophisticated architectures to accomplish the same recognition task. (Koerich et al., 2003) Off-line handwriting recognition refers to the process of recognizing words that have been scanned from a surface (such as a sheet of paper) and are stored digitally in grey scale format. After being stored in the above-mentioned format, it is conventional to perform further processing to allow superior recognition. In the On-line case, the handwriting is captured and stored in digital form via different means. Usually a special pen is used in conjunction with an
electronic surface. As the pen moves across the surface, the twodimensional coordinates of successive points are represented as a function of time and are stored in order (Plamondon and Srihari, 2000). It is generally accepted that the on-line method of recognizing handwriting has achieved better results than its off-line counterpart. This may be attributed to the fact that more information may be captured in the on-line case such as the direction, speed and the order of strokes of the handwriting. This information is not as easy to recover from handwritten words written on static medium such as paper.
TYPICAL HANDWRITING RECOGNITION SYSTEM A typical segmentation based handwriting recognition system consists of following steps and their relationship is demonstrated in figure 2. 1. Digitization (image acquisition) 2. Preprocessing 3. Segmentation 4. Feature extraction 5. Recognition (classification)
Figure 2
Segmentation based handwritten word recognition system
Preprocessing The task of preprocessing relates to the removal of noise and unwanted variations in handwritten word pattern (Brown and Ganapathy, 1983). The preprocessing may itself be broken into smaller tasks such as thresholding, slant, skew detection and removal, base-line detection (upper and lower), smoothing and resizing. By preprocessing each handwritten word, the task of subsequent recognition is simplified (Blumenstein et al., 2002). The main goal of such preprocessing tasks is to reduce the huge variability of handwriting (Nicchiotti and Scagliola, 1999) and to make the writing style as uniform possible (Pastor, 2004). In this regard, Watanabe et al., (1997) conducted comparative experiments showing that normalization minimizes the error of recognition, as input of CWR system supposed to be written
horizontally and descenders aligned along the vertical direction. Nevertheless, in real world its rules are rarely respected (Sarfraz et al., 2007). Hence in handwriting segmentation/ recognition systems, a preprocessing stage is normally included. Several methods have been proposed in the literature for estimating the above parameters. Below is a list of preprocessing techniques that have been employed by various researchers in an attempt to increase the performance of the segmentation / recognition process: Line removal from text Skew removal Reference line detection (lower / upper base lines) Slant estimation and correction Scaling and noise elimination Contour smoothing Skeleton
i. Line removal from printed and script writing In document images, printed lines are frequently used overlapping with hand written elements especially in case of signatures. Basically, these lines are used to align the writer on the horizontal axis. Typical examples of such images are bank cheque, receipts and payment slips. However, these lines create critical problems for the OCR system and therefore are required to be detected and removed properly. Several approaches have been proposed for underline removal in the literature. Most of them detached underline from binarized image by the dilation and erosion operators of the mathematical morphology (Yong et al., 1997; Serra, 1982; Cheriet and Suen; 2001, Charles et al., 1988). Dilation was applied until all the lines longer than a fixed threshold are removed from the underline region. On the other hand this operator
shattered the characters and therefore it became difficult to recognize. Hence erosion was applied to recover the lost parts of the characters. However, broken characters could not restore correctly (Guillevic and Suen, 1993). Govindaraju and Srihari (1993) achieved underline removal by using the good continuity criterion. The criterion first detects the smooth strokes in the image and then identifies the spine of the image as the smooth stroke with maximum length and finally is removed. However, this approach worked out on thinned images and therefore it requires a preliminary time consuming process. Secondly, global properties of textual word shapes and those of interfering strokes were used to separate them. Yu and Jain (1996) have proposed a method for line removal and character restoration using Block Adjacency Graph representation of the input binary image. The horizontal form lines were located by finding long straight lines based on the block adjacency graph. Form line separation and character reconstruction were also implemented from this graph. Yoo et al., (1997) have classified the various types of junction points at the point of contact or crossing over of the characters and the line. After line removal, the junction points are detected and restored based on their classification type. Koerich and Ling (1998) have detected and removed the lines using horizontal projection profile (HPP). The removed regions are rectified by checking the neighbors for every pixel that could be fitted into the erased line. Based on whether the neighboring pixels satisfy certain condition or not, they decide to leave the pixel on or off. Blumenstein et al., (2002) introduced new pre-processing techniques for underline removal and restoration based on horizontal black pixel runs. It is assumed that word stroke thickness will be similar to the thickness of the underlines present in the word. However this assumption is not true in all cases particularly for printed documents/forms. Finally, authors acknowledged that underline removal and restoration did not perform well on some of the more erratic underlines that were
present in some word images. Therefore remainders of undetected underlines were removed manually to facilitate further processing. In some algorithms such as proposed by (Wang and Srihari, 1991), broken characters are restored. If result that character recognizer is performed with the restored characters is wrong, restored characters are sent back to the restoration algorithm stage. In such methods, the processing time was increased because it has feedback paths. In addition, characters are sometimes recognized incorrectly such as h and b (Yong et al., 1997). Bai and Huo, (2004) used strategies of connected component and bottom edge analysis to detect underline in printed text. Prior to removal of detected underline, an OCR engine is used to recognize and verify the input text line. However the approach dealt with underline in printed text and failed in script line removal. Recently, Arvind et al., (2007) detect multiple printed lines with varying thickness present in the word image using horizontal projection profile. Restoration of the smashed characters is performed by using Bresenham line drawing algorithm (Foley et al.,). However, technique can not deal with restoration of printed characters and skewed images. To conclude, common problems with techniques developed so far are: 1. Computationally expensive as consists of two stages: line removal and restoration of smashed characters. 2. Deal with underline removal in printed text rather line removal. 3. Cant deal with line removal in script writing. 4. Cant deal with skewed line removal in script writing. ii. Reference lines detection Reference lines detection in script writing is one of the important preprocessing techniques. There are four reference lines upper line, upper baseline, lower baseline and lower lines shown in figure 3. Reference lines are employed by research community for various objectives such as height of word, core- region detection,
position of ascenders and descenders, feature extraction for character segmentation/ recognition. The crucial part in this process is the detection of upper-baseline and lower baseline of the word image, commonly known as core-region. The accurate estimation of core-region is of worth importance in cursive handwriting segmentation/recognition performance (Madhavanath and Shrihari, 1996). It determines the relative character height which is essential, for example, to discriminate characters like e and I. Some strokes in a word image may extend above or below the core- region or main body of a handwriting sample. Such letter components are called ascenders and descenders respectively (Senior and Robinson, 1996). Examples of letters that contain such strokes are: f, j, g etc. Hence the core-region of a word image that does not contain ascenders and descenders is bounded by an upper-baseline and baseline (Brown and Ganapathy, 1983) as shown in figure 3.
Figure 3
Four reference lines and core region
Core-region served for variety of operations such as, slant/skew removal, feature extraction for Latin and Arabic character segmentation and recognition. (Be and Nohl, 1992; Nicchiotti and Scagliola, 1999; Bozinovic and Srihari, 1989; Caesar and Gloger, 1993; Madhavanath and Shrihari, 1996; Cote et
10
al., 1998; Blumenstein et al., 2002 ; Verma, 2002; Cheng and Blumenstein, 2005; Ramy El-Hajj et al., 2005) First, historically first, Bozinovic and Srihari (1989) detected reference lines based on horizontal density histogram commonly known as BSM. The estimation of the core region, the fundamental step, is made by finding the lines with the first derivative of horizontal density (number of foreground pixels per line). The horizontal density histogram is analyzed looking for features such as maxima and first derivative peaks, but these features are very sensitive to local characteristics and many heuristic rules are needed to find the actual core region lines (Morita, 1999). Some alternative techniques were proposed in cote et al., 1998; Vinciarelli and Luettin, 2001). Vinciarelli and Luettin (2001) applied the Otsu method in order to find a threshold distinguishing between core region lines (above the threshold) and other lines. In such works, the density distribution is analyzed rather than the density histogram itself in order to make statistically negligible the in influence of local strokes. However, the reason that the core-region was erroneously detected due to the fact that upper baseline and lower baseline were incorrectly set. This occurred as a result of a number of characters that contained large horizontal strokes such as the letter t, g, h etc and for characters with long horizontal strokes in the two upper quarters of each word (Blumenstein et al., 2002). iii. Skew estimation and correction Skew detection and correction are important preprocessing steps of document layout analysis and OCR approaches (Sarfraz et al., 2007). Furthermore, it improves performance of the classification system (Gatos et al., 2006a). Skew correction is the process of first detecting whether the handwritten word has been written on a slope and then rotating the word if the slopes angle is too high so the baseline of the word is horizontal. The slope is the
11
angle between the horizontal direction and the direction of the line on which the writer aligned the word as shown in figure 4.
Figure 4
Slope and Slant
The slope removal approaches tries to correct this angle obtaining a text aligned with the horizontal direction. It is a process which aims at detecting the deviation of the image orientation angle from the horizontal direction. However, for skew correction the crucial step is to find skew angle correctly. In the literature, most of the skew correction techniques are based on projection profile (horizontal /vertical). The traditional method for slope removal (Bozinovic and Srihari, 1989) starts by finding a first rough estimate of the core region, the region enclosing the character bodies. This estimate is biased by the fact that the word is not horizontal, so that the upper and lower limits of the estimated core region do not fit well, as they should. To solve this problem, the stroke minima closer to the lower limit of the estimated core region are used to fit the line connecting the bottom points of the character bodies. This is the line on which the word is aligned (called lower baseline). The image is deskewed when the lower baseline is horizontal, which is a condition achieved by rotational transform. However, core-zone detection is prerequisite for skew estimation. While on the other hand, core-zone lines are
12
usually obtained as the ones surrounding the highest density peaks, but this technique is strongly affected by the presence of long horizontal strokes that can be confused with the actual core region (Vinciarelli and Luettin, 2001). Caesar et al., (1993) fits a straight line through extreme values in the vertical direction to detect reference lines for skew correction. Some more examples of techniques for slope correction are described in (Senior, 1994; Brown and Ganapathy, 1983). Senior and Robinson (1998) described a skew detection technique whereby minima in the lower contour of the word image are first located and a line of best fit was drawn through these points. Morita et al., (1999) proposed a method based on mathematical morphology to obtain a pseudo-convex hull image. Then minima are detected on the pseudo-convex image and a reference line is fit through those points. The primary challenge in these methods is the rejection of spurious minima. Also, the regression-based methods do not work well on short words because of the lack of a sufficient number of minima points. Cote et al., (1998) computed several histograms for different y projections. Then the entropy was calculated for each of them. The histogram with the lowest entropy determined the slope angle. Likewise, Kavallieratou et al., (1999) employed WignerVille distribution to calculate several horizontal projection histograms. The slope angle was selected by Wigner-Ville distribution with the maximal intensity. The main problem for these distribution based methods is high computational cost since an image has to be rotated for each angle. Cai and Liu (2000), rotated the image for each angle in an interval and finally, image was thought desloped, if the rotated image created the highest peak of the first derivative of the horizontal density histogram. Vinciarelli and Luettin (2001), proposed new techniques for both slant and slope angle estimation without any heuristics and manual parameters. Indeed, it avoids the heavy experimental effort required to find the optimal configuration of a parameter set. However, the technique was based on a cost function which measures slant absence across the word image. The cost function
13
was evaluated on multiple shear transformed word images. The angle with the maximal cost was taken as a slant estimate. However, it is computationally heavy since multiple shear transformed word images corresponding to different angles in an interval have to be calculated (Dong et al, 2005). Dong et al., (2005) presented a new fast and robust technique for word skew correction based on random transform. For the skew correction, they maximize a global measure which is defined by radon transform of image and its gradient to estimate the slope. Compared with the previous methods, proposed algorithm does not require the setting of parameters heuristically. Moreover, the algorithm performs well on words of short length, where the traditional methods usually fail. Its computationally more efficient. Recently, Gatos et al., (2006a) introduced skew correction technique based on horizontal projection. Accordingly, the skewed word was bisected and horizontal projection for each part was calculated (see figure 5). Due to word skew, the distribution of the left and the right horizontal word projections exhibit a vertical offset. The skew angle was calculated by the following formula.
Where x max is the width of the image.
14
Figure 5
Horizontal projection of left and right part of the word
Indeed Projection Profile based approaches proved to be effective and accurate. However, projection profiles are strictly based on straight lines (reference lines). If the image is noised and contained more ascenders and descenders characters then PP based approaches can not bring fruitful results for skew detection and correction (Nicchiotti and Scagliola, 1999). In this regard, Nicchiotti and Scagliola (1999) employed generalized projections (GP) extension of projection profile (PP) for skew detection and removal but made the whole process computational heavy. The second class of skew correction is based on component analysis in which most significant eigenvector is calculated which leads to the skew angle of distribution. The problem with this method is that each eigenvector is constructed with support from projections of every point which is expensive in terms of time and that they are least squared estimation techniques and hence fail to account for outliers which are common in images. Blumenstein et al., (2002) introduced new preprocessing technique for skew removal by dividing the word image into two equal parts. Skew angle was detected by drawing a line between the centers of each part as shown in figure 6. Furthermore, skew detection was
15
achieved through elimination of word ascenders and descenders information that increase computational burden.
Figure 6 Skew angle estimation
To conclude, in most of the literature reviewed, preprocessing techniques are described as part of an overall system for handwriting recognition (Senior and Robinson, 2000), (Kim et al., 1999), (Madhvanath et al., 1999) and therefore it is very hard to compare the results. iv. Slant estimation and correction Slant estimation and correction is an integral part of any word image preprocessing because, handwritten text is usually characterized by slanted characters and therefore requires to be normalized. Slant is the clockwise or anticlockwise angle between the vertical direction and the vertical text strokes (Pastor et al., 2004). In an ideal model of handwriting, strokes are supposed to be vertical. However, detection of correct slant angle is a crucial step. Handwritten text is usually characterized by slanted characters and therefore requires to be normalized. The slant is the clockwise or anticlockwise angle between the vertical direction and the vertical text strokes (Pastor et al., 2004). In an ideal model of handwriting,
16
are supposed to be vertical. However, detection of correct slant angle is a crucial step. Literature is replete with several approaches to slant angle detection, which can be roughly divided into three groups. The first two groups deal with uniform slant correction whilst the third group deals with local slant correction. Initially, Bozinovic and Srihari(1989), calculate slant angle by detecting near vertical strokes and taking the average angle of those as the shear angle. Later, several others such as Kim and Govindaraju (1996); Knerr et al., (1998); Senior and Robinson (1998); Vinciarelli and Luettin (2001) and Paster, et al., (2004) have followed their lead, using different ways to select the strokes. Vertical projection profile based approaches shear the text for a discrete number of angles around the vertical orientation. For each of these images, the vertical projection profile is calculated. The profile giving the maximum variation is taken as the profile corresponding to the deslanted image. These methods use different criteria to select near-vertical strokes. The slopes of those selected strokes are estimated from the contours. Compared to the other approaches this method is quite fast, but on the downside, it relies heavily on heuristics, hence is not very robust. Additionally, such approaches require the detection of the edges of the characters and its accuracy that depends on the included characters in the word (Zeeuw, 2006). The second group evaluates a measure function of the image on a range of shear angles and selects the angle with highest value of the measure. The measure is most often computed from the vertical histogram, based on the idea that deslanted writing will have a more intense histogram, i.e. with higher and more pronounced peaks than slanted writing. In this regard, Kavallieratou et al., (1999, 2000, 2001) proposed a slant correction technique based on hybrid approach of Wigner-Ville distribution and projection profile technique. Wigner-Ville distribution is used to calculate degree of variation through the different vertical projection profiles. Finally, the hybrid approach was integrated in a complete image processing system (Kavallieratou; 2003). Cote et al, 1998 computed several histograms for different y projections.
17
Then the entropy is calculated for each of them. The histogram with the lowest entropy will determine the slant angle. Likewise, few more approaches are detailed in (Cai and Liu 2000; ElYacoubi et al., 1999). Despite, the approaches based on the optimization are relatively robust. However, methods are computationally heavy since multiple shear transformed word images corresponding to different angles in an interval have to be calculated. Additionally, an average slant angle is calculated based on structure features of all characters in the word, which is not valid to all characters. Lastly, approaches distinguish from all predecessors to correct non-uniform slant. The techniques above shear a word (or bigger units) uniformly, i.e. by a single angle, hence can never fully cope with variant-slanted words. Uchida et al., 2001 and Taira et al., 2004 tackled the problem of non-uniform slant correction using dynamic programming techniques. To apply different shear angles at different points within a word, one has to split the word up into intervals and shear each of those individually. To determine what intervals to take, and by what angle to shear over each interval, a criterion is optimized that evaluates the sequences of intervals and angles simultaneously. Dong et al., (2005) presented a new fast and robust technique for word slant correction based on random transform. For the slant correction, radon transform is used to estimate the long strokes and a word slant is measured by the average angle of these long strokes. Compared with the previous methods, the algorithms do not require the setting of parameters heuristically. Moreover, the algorithms perform well on words of short length, where the traditional methods usually fail. Its computationally more efficient. To conclude such methods have a lot of potential, since it can cope with variant-slanted words. The results are indeed promising, although there are more robustness issues, as the algorithm has greater freedom to make errors within a word. Also the theoretical background and mathematical techniques are somewhat more demanding. Finally, an independent correction of each component is also not viable, since this may produce distortions when broken
18
characters are present in the string. Common problems in the existing slant angle estimation approaches are: Computationally heavy and therefore are slow Heavily depends on heuristics and therefore are not robust Compute an average slant angle that is not suitable for all characters in the word
v. Scaling and noise elimination Scaling may sometimes be necessary to produce characters/ words of relative size. Burges et al., (1992) used a neural network for the segmentation stage of their system. The input to ANN was the core-region of the word. Since ANN takes fixed size of the input (core-region), therefore, all words scaled to fixed size to make all cores to uniform height (Gatos et al., 2006a; Indira and Selvi, 2007). Noise may be introduced in document images through (1) physical degradation of the hard-copy documents during creation, and/or storage, and (2) the digitization procedure, such as scanning. Most document enhancement algorithms can remove large noise components (e.g., marginal black strips) and small noise components (e.g., pepper-and salt noise) with morphological operations. However, noise components with a compatible size to printed words cannot be easily removed. OGorman (1992) proposed the k-Fill algorithm which is a manually designed approach and has been used by several other researchers. Experiments show it is effective for removing saltand-pepper noise. Chen et al., (1992) used morphological opening operations to remove noise in handwritten words. Liang et al. (1996) proposed a semi-manually designed approach with a 3x3 window size. Loce (1997) used artificially degraded images generated by models for training. Kanungo et al. (1994, 1995, 2001) proposed methods for validation and parameter estimation of degradation models. Though the uniformity and sensitivity of their
19
approach has been tested by other researchers, no degradation model has been declared to pass the validation. Another problem with morphological approaches is the small window sizes. The most commonly used window size is no larger than 5x5, which is too small to contain enough information for enhancement. Kim et al., (1999) identified noise in a word image by comparing the sizes and shapes of connected components in an image to the average stroke width. Madhvanath et al., (1999) also analyze the size and shape of connected components in a word image and compare them to a threshold to remove salt and pepper noise. The above approaches only identify and remove small-sized noise components. The removal of large-sized noise components is also addressed in the literature, such as marginal noise removal, underline, line and skewed line removal. In postal address words and other real world applications, larger noise is sometimes present such as underlines. Therefore to remove noise, few researchers have also applied some form of underline removal to their word images (Dimauro et al., 1997; Pirlo and Salzo, 1997; Blumenstein et al., 2002). Moreover, it is hard to discriminate noise from compatible sized text. Yefeng-Zheng (2006) first segment the document at a suitable level, and each segmented block is classified into machine printed text, handwriting, or noise. Machine printed text, handwriting, and noise have different visual appearances and physical structures. Structural features are extracted to reflect these differences. Gabor filter features and run-length histogram features can capture the difference in stroke orientation and stroke length between handwriting and printed text. Compared with text, noise blocks often have a simple stroke complexity. Therefore, crossingcount histogram features are exploited to model such differences. Researchers also took regions of machine printed text, handwriting, and noise blocks as different textures. Two sets of bi-level texture features were used for classification.
20
SUMMARY The goal for normalization/preprocessing techniques is to eliminate the handwriting variability that is inherited in cursive handwriting. Consequently, it facilitates segmentation and feature extraction process and also improves their classification accuracy. Hence preprocessing techniques are normally included in all document analysis and recognition systems. This chapter has presented a critical review of the existing normalization techniques. Recently, developed new normalization techniques by the authors are also presented. Comparison of new normalization techniques with the old ones exhibits their worth in sense of accuracy and computational complexity.
REFERENCES ALKOOT, M., AND KITTLER, J. (1999). Experimental evaluation of expert fusion strategies. Pattern Recognition Letters, 20(1113):1361-1369. ARICA, N., YARMAN-VURAL, F., T. (2002). Optical character recognition for cursive handwriting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(6):801-813, 2002. ARVIND, K. R., KUMAR, J. AND RAMAKRISHNAN A. G. (2007). Line Removal and Restoration of Handwritten Strokes. Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 208-214. BELONGIE, S., MALIK, J., PUZICHA, J. (2002). Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence, 24 (2), 509-522. BIPPUS, R., AND MARGNER, V. (1999). Script recognition using inhomogeneous p2dhmm and hierarchical search space
21
reduction. In Proc. 5th International Conference on Document Analysis and Recognition.773-776, Bangalore, India BISHNU, A., AND CHAUDHURI, B., B. (2007). Segmentation of Bangla Handwritten Text into Characters by Recursive Contour Following. Proceedings on Computing Theory and Applications. (ICCTA'07). BLUMENSTEIN, M., AND VERMA, B. (1997). A Segmentation Algorithm used in Conjunction with Artificial Neural Networks for the Recognition of Real-World Postal Addresses. Proceedings of 2nd Online World Conference on Soft Computing in Engineering Design and Manufacturing. BLUMENSTEIN, M., AND VERMA, B. (1998a). An Artificial Neural Network Based Segmentation Algorithm for Off-line Handwriting Recognition. Proceedings of the Second International Conference on Computational Intelligence and Multimedia Applications, Gippsland, Australia, 306- 311. BLUMENSTEIN, M., AND VERMA, B. (1998b). A Neural Based Segmentation and Recognition Technique for Handwritten Words, Proceedings of the World Congress on Computational Intelligence, Anchorage, Alaska, 1738-1742. BLUMENSTEIN, M., AND VERMA, B. (1998c). Conventional vs. Neuro-Conventional Segmentation Techniques for Handwriting Recognition: A Comparison, Proceedings of the Second IEEE International Conference on Intelligent Processing Systems, Gold Coast, Australia, 473-477. BLUMENSTEIN, M., AND VERMA, B., (1999a). Neural Solutions for the Segmentation and Recognition of Difficult Words from a Benchmark Database. Proceedings of the 5th International Conference on Document Analysis and Recognition, (ICDAR 99), Bangalore, India, 281-284. BLUMENSTEIN, M., AND VERMA, B. (1999b). A New Segmentation Algorithm for Handwritten Word Recognition. Proceedings of the International Joint Conference on Neural Networks, Washington., D.C., Vol 4, 878-882.
22
BLUMENSTEIN, M., AND VERMA, B. (2001). Analysis of Segmentation Performance on the CEDAR Benchmark Database. Proceedings of 6th International Conference on Document Analysis and Recognition. 1142-1146. BLUMENSTEIN, M., CHENG, C., K., AND LIU, X., Y., (2002). New Preprocessing Techniques for Handwritten Word Recognition. Proceedings of 2nd International Conference on Visualization, Imaging and Image Processing, ACTA. Press, Calgary, 480484. BLUMENSTEIN, M., VERMA, B., AND BASLI, H. (2003). A novel feature extraction technique for the recognition of segmented handwritten characters. In M. Fairhurst, and A. Downton (Eds.), Proceedings of the Seventh International Conference on Document Analysis and Recognition, 137141. BLUMENSTEIN, M., LIU, X. Y., AND VERMA, B. (2004). A Modified Direction Feature for Cursive Character Recognition, Proceedings of the International Joint Conference on Neural Networks, Budapest, Hungary, 2983-2987. BLUMENSTEIN, M., LIU, X., Y., AND VERMA, B.(2007). An Investigation of the Modified Direction Feature for Cursive Character Recognition. Pattern Recognition, Vol. 40, 376-388. BORTOLOZZI, F., SOUZA, A., BRITTO JR., LUIZ S. OLIVEIRA AND MORITA, M. (2005) Recent Advances in Handwriting Recognition. Document Analysis, Editors: Umapada Pal, Swapan K. Parui, Bidyut B. Chaudhuri, Pages 1-30. BOSE, C. B., KUO, S. (1994). Connected and Degraded Text Recognition using Hidden Markov Model. Pattern Recognition, 27(10), 1345-1363. BOZINOVIC, R., M., AND SRIHARI, S., N. (1989). Off-line Cursive Script Word Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. BRETTO, A., AZEMA, J., CHERIFI, H., AND LAGET, B. (1997). Combinatorics and Image Processing, Graphical Models and Image Processing vol. 59, 256-277.
23
BRETTO, A., CHERIFI, H., AND ABOUTAJDINE, D. (2002) Hypergraph imaging: an Overview, Pattern Recognition vol. 35, 651-658. BREUKELEN, T., M., DUIN, R., KITTLER, J. (2000). Combining multiple classifiers by averaging or by multiplying? Pattern Recognition, 33(9):1475-1485. BRUEL, T. (1994). Design and Implementation of a System for Recognition of Handwritten Responses on US Census Forms, Proceedings of the IAPR Workshop on Document Analysis Systems, Kaiserlautern, Germany, 237-264. BROWN, M., K., GANAPATHY, S. (1983). Preprocessing Techniques for Cursive Script Word Recognition. Pattern Recognition, 16(5), 447-458. BREIMAN, L. (1996). Bagging predictors. Machine Learning, 24(2):123-140, 1996. BRITTO, JR. A., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y.(2001a): An Enhanced HMM Topology in an LBA Framework for the Recognition of Handwritten Numeral Strings, Proceedings of the International Conference on Advances in Pattern Recognition, Vol 1, 105-114, Rio de Janeiro-Brazil. BRITTO, JR. A., SABOURIN, R., BORTOLOZZI, F., AND SUEN, C. Y. (2001b). A two-stage HMM-based system for recognizing handwritten numeral strings. Proceedings of the International Conference on Document Analysis and Recognition, 396-400, Seattle, USA. BRITTO, JR. A., SABOURIN, R, LATHERIER, E., BORTOLOZZI, F. AND SUEN, C.Y. (2000). Improvement in handwritten numeral string recognition by slant normalization and contextual information. Proceeding of 7th International Workshop on Frontiers in Handwriting Recognition, 323332. BRITTO JR. A, SABOURIN, R., BORTOLOZZI, F., SUEN, C-Y. (2002). A string length predictor to control the level building of HMMs for handwritten numeral recognition. Proceedings of 16th International Conference on Pattern Recognition, Vol. 4, 3134.
24
BRITTO JR., A. SABOURIN, R., BORTOLOZZI, F., AND SUEN, C.Y. (2004). Foreground and background information in an HMMbased method for recognition of isolated characters and numeral strings. Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, 371376. BRETTO, A., CHERIFI, H., AND ABOUTAJDINE, D. (2002). Hypergraph imaging: an Overview. Pattern Recognition, vol.35, 651-658, 2002. BUNKE, H., ROTH, M AND SCHUKAT-TALAMAZZINI, E-G. (1995). Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognition, 28(9):1399-1413. BUNKE, H., AND WANG, P., Editors. Handbook of Character Recognition and Document Image Analysis. World Scientific, 1997. Pages 123156. BURGES, C., J., C., BE, J., I., AND NOHL, C., R. (1992). Recognition of Handwritten Cursive Postal Words using Neural Networks. Proceedings of the 5th United States Postal Service (USPS) Advanced Technology Conference, 117-124. BURGES, C.J.C., DENKER, J.S., LECUN. Y., AND NOHI. C.R., (1993) Off-line recognition of handwritten postal words using neural networks. International Journal of Pattern Recognition and Artificial Intelligence. 7(4) 689- 704. CAESAR, T., GLOGER, J., M., AND MANDLER, E. (1993). Preprocessing and Feature Extraction for a Handwriting Recognition System. Proceedings of International Conference on Document Analysis and Recognition. 408-411. CAI, J., AND LIU, Z.-Q. (1999). Integration of structural and statistical information for unconstrained handwritten numeral recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(3), 263270. CAI, J., AND LIU, Z-Q. (2000). Off-line unconstrained handwritten word recognition. International Journal of Pattern Recognition and Artificial Intelligence, 14(3), 259-280.
25
CAMASTRA, F., AND VINCIARELLI, A. (2003). Combining neural gas and learning vector quantization for cursive character recognition. Neuro-computing, 51, 147159. CAESAR, T., GLOGER, J.M. AND MANDLER, E. Preprocessing and features extraction for handwriting recognition system. International conference on document analysis and recognition, 408-411, 1993. CAO, J., AHMADI, M., AND SHRIDHAR, M. (1995). Recognition of handwritten numerals with multiple feature and multistage classifier. Pattern Recognition, 28(3), 153159. CASEY, R., G., (1992). Segmentation of Touching Characters in Postal Addresses, Proceedings of the 5th USPS Advanced Technology Conference, 743-754. CASEY, R., G., AND LECOLINET, E. (1996). A Survey of Methods and Strategies in Character Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, 18, 690-706. CAVALIN, P., R., BRITTO, A., S., BORTOLOZZI, F., SABOURIN, R., AND OLIVEIRA, L., S. (2006). An Implicit Segmentation based Method for Recognition of Handwritten Strings of Characters. Proceedings of ACM symposium on applied computing, 836840. CHANGMING, S., AND DEYI S. (1997). Skew and Slant Correction for Document Images Using Gradient Direction. Proceedings of 4th International Conference on Document Analysis and Recognition, 142-146. CHARLES, R., GIARDINA, EDWARD, R., DOUGHERTY. (1988). Morphological Methods in Image and Signal Processing, Prentice Hall, Inc. CHEN, M-Y., KUNDU, A., ZHOU, J., AND SRIHARI, S., N. (1992). Off-Line Handwritten Word Recognition using Hidden Markov Model, Proceedings of the 5th USPS Advanced T. CHEN. M.Y, KUNDU AND A, ZOHU J. (1994) Off-line handwritten word recognition using a hidden Morkov model type stochastic network. IEEE Trans Patten Analysis and Machine Intelligence 16(5):481-491.
26
CHELLAPILLA, K., SHILMAN, M., SIMARD, P. (2006) Combining Multiple Classifiers for Faster Optical Character Recognition. Proceedings of International conference on Document Analysis Systems, Springer, LNCS 3872, 358367. CHO, W., LEE, S. W., KIM, J. H.(1995). Modeling and Recognition of Cursive Words with Hidden Markov Models. Pattern Recognition, 28(12), 1941-1953. CHO, S.-B. (1997). Neural-network classifiers for recognizing totally unconstrained handwritten numerals. IEEE Transactions on Neural Networks, 8(1), 4353. CHEN, M-Y., AND KUNDU, A. (1993). An Alternative Approach to Variable Duration HMM in Handwritten Word Recognition, Proceedings of the 3rd International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, 82-91. CHEN, M-Y., KUNDU, A., ZHOU, J.(1994). Off-Line Handwritten Word Recognition Using a HMM Type Stochastic Network, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5), 481-496. CHEN, M-Y., KUNDU, A., AND SRIHARI, S., N., (1995). Variable duration hidden Morkov model and morphological segmentation for handwritten word recognition. IEEE transactions on Image Processing, 4(12), 1675-1687. CHEN, Y. AND LEEDHAM, G. (2005). Independent Component Analysis Segmentation Algorithm. Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR05), Vol. 2, 680- 684. CHENG, C. K., LIU, X., Y., BLUMENSTEIN, M., AND MUTHUKKUMARASAMY, V. (2004). Enhancing Neural Confidence-Based Segmentation for Cursive Handwriting Recognition, 5th International Conference on Simulated Evolution and Learning, Busan, Korea, SWA-8. CHENG, C., K., AND BLUMENSTEIN, M. (2005a). The Neural Based Segmentation of Cursive Words Using Enhanced Heuristics. Proceedings of the 8th International conference on Document analysis and recognition. Vol 2, 650-654.
27
CHENG, C., K., AND BLUMENSTEIN, M. (2005b). Improving the Segmentation of Cursive Handwritten Words using Ligature Detection and Neural Validation. Proceedings of the 4th Asia Pacific International Symposium on Information Technology (APIS 2005), Gold Coast, Australia, 56-59. CHERIET, M. (1993). Reading Cursive Script by Parts. Proceedings of the 3rd International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, May 25-27, 403-408. CHERIET, M., KHARMA, N., LIU, C-LIN., SUEN, C-Y., (2007) Character Recognition Systems (OCR), Page 204-206(Wiley, 2007). CHEVALIER, S., GEOFFROIS, E., PRETEUX, F., AND LEMALTRE, M. (2005). A Generic 2D Approach of Handwriting Recognition. Proceedings of the 8th International Conference on Document Analysis and Recognition, 489493. CHIANG, J-H. (1998). A hybrid neural model in handwritten word recognition. Neural Networks, 11(2), 337346. COTE, M., LECOLINET, E., CHERIET, M. AND SUEN, C.Y. (1998). Automatic reading of cursive scripts using a reading model and perceptual conceptsthe PERCEPTO system. International Journal of Document Analysis and Recognition. 1(1), 317. DAWOUD, A. (2007). Iterative Cross Section Sequence Graph for Handwritten Character Segmentation. IEEE Transactions on Image Processing, 16(8). DECOSTE AND B. SCHLKOPF (2002). Training invariant support vector machines. Machine Learning Journal, 46(1-3):161-190. DIMAURO, G., IMPEDOVO, S., PIRLO, G., AND SALZO, A. (1997). Removing Underlines from Handwritten Text: An Experimental Investigation, Progress in Handwriting Recognition, A. C. Downton and S. Impedovo, World Scientific Publishing, 497-501. DIMAURO, D., IMPEDOVO, S., PIRLO, G., AND SALZO, A. (1998). An Advanced Segmentation Technique for Cursive Word Recognition. Advances in Handwriting Recognition, S. W. Lee (ed.), World Scientific Publishing,255-264.Technology Conference, 563-579.
28
DING, Y., KIMURA, F., MIYAKE, Y., AND SHRIDHAR, M., (1999). Evaluation and Improvement of Slant Estimation for Handwritten Words. Proceedings of the 5th International Conference on Document Analysis and Recognition, Bangalore, India, 753-756. DING, Y., KIMURA, F., MIYAKE, Y., AND SHRIDHAR, M. (2000). Accuracy improvement of slant estimation for handwritten words. Proceedings of 15th International Conference on Pattern Recognition, Vol. 4, 527-530. DONG, J-X, KRZYYZAK, A., AND SUEN, C-Y. (2001). A muti-net learning framework for pattern recognition. Proc. of the Sixth International Conference on Document Analysis and Recognition, Seattle, 328-332. DONG, J-X, DOMINIQUE, P., KRZYYZAK, A., SUEN, C-Y. (2005). Cursive word skew/slant corrections based on Radon transform. Proceedings of the 8th International Conference on Document Analysis and Recognition, 478-483. DUNN, C., E., AND WANG, P., S., P., (1992). Character Segmenting Techniques for Handwritten Text - A Survey. Proceedings of 11th International Conference on Pattern Recognition, vol. 2, 577-591. DZUBA, G., FILATOV, A., GERSHUNY, D., AND KILL. I., (1998) Handwritten word recognition, the approach proved by practice. In Proc. 6th International Workshop on Frontiers in Handwriting Recognition, 99-111, Taejon, Korea. EARNEST, L., D. (1962). Machine Recognition of Cursive Writing, Information Processing. C. Cherry (ed.), London, Butterworth, 462-466. EASTWOOD, B., JENNINGS, A., AND HARVEY, A. (1997). Neural Network Based Segmentation Handwritten Words. th Proceedings of 6 International Conference on Image Processing and its Applications, Vol. 2, 750-755. EHRICH, R., W., AND KOEHLER, K., J. (1975). Experiments in the Contextual Recognition of Cursive Script. IEEE Transactions on Computers, 24, 182-194.
29
ELMS, A. J., PROCTER, S., ILLINGWORTH, J. (1989). The Advantage of using an HMM-based Approach for Faxed Word Recognition. International Journal on Document Analysis and Recognition, 18-36. ELLIMAN, D.G., AND LANCASTER, I.T. (1990). A review of segmentation and contextuel analysis techniques for text recognition. Pattern Recognition, 23(3-4), 337346. EL-YACOUBI, A., GILLOUX, M., SABOURIN, R., SUEN, C. Y. (1999). An HMM-based Approach for on-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):752-760. FAN, X., AND VERMA, B. (2002). Segmentation vs. nonsegmentation based neural techniques for cursive word recognition. International Journal of Computational Intelligence and Applications, 2(4), 18. FAROUZ, C., GILLOUX, M., AND BERTILLE. J.M., (1998). Handwritten word recognition with contextual Hidden Morkov Models. In Proc. 6th International Workshop on Frontiers in Handwriting Recognition. 133-142. Taejon. Korea. FAVATA, J., T., AND SRIHARI, S., N. (1992). Recognition of General Handwritten Words Using Hypothesis Generation and Reduction Methodology, Proceedings of the 5th USPS Advanced Technology Conference, 237-251. FAVATA, J., T. (1997). Character Model Word Recognition, Progress in Handwriting Recognition, A. C. Downton and S. Impedovo (eds.), 57-61. FAVATA, J. T., (2001). Offline General Handwritten Word Recognition using an Approximate Beam Matching Algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23:393398. FOLEY J.D. DAM, A.V., FEINER, S.K., HUGHES, J.F. Computer Graphics: Principles and Practice in C, 2nd Edition, AddisonWesley, Pearson Education
30
FREITAS, F. BORTOLOZZI AND R. SABOURIN. Handwritten isolated word recognition: An approach based on mutual information for feature set validation. In Proc. 6th International Conference on Document Analysis and Recognition, SeattleUSA, September, 665-669, 2001. FRISHKOPF, L., S., AND HARMON, L., D. (1961). Machine Reading of Cursive Script. Information Theory, C. Cherry (ed.), Butterworth, London, 300-316. 143 FUJISAWA, H., NAKANO, Y., AND KURINO, K. (1992). Segmentation methods for character recognition: From segmentation to document structure analysis. Proceedings of the IEEE, 80(7), 10791092. FUKUSHIMA, K., AND IMAGAWA, T. (1993). Recognition and Segmentation of Connected Characters with Selective Attention, Neural Networks, 6, 33-41. GADER P.D., MOHAMMED, M.A., AND CHIANG, J.H., (1994). Handwritten word recognition with character and inter character neural networks. IEEE Transaction systems. Man and Cybernetics- Part B, 27:158-164, 1994. GADER, P.D., WHALEN, M., GANZBERGER, M. AND HEPP, D (1995). Hand printed word recognition on a NIST data set. Machine Vision and Applications. Vol. 8, 31-41. GADER, P.D., MOHAMED, M., AND CHIANG, J.-H. (1997). Handwritten word recognition with character and intercharacter neural networks. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 27(1), 158164. GADER, P. D., MOHAMED, M., A., KELLER, J.M. (1996b). Fusion of handwritten word classifiers. Pattern Recognition Letters, 17(6):577-584. GADER, P.D. AND KHABOU, M.A. (1996a). Automatic Feature Generation for Handwritten Digit Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18(12), 1256-1261. GANG, L., VERMA, B., AND KULKARNI, S. (2002). Experimental analysis of neural network based feature extractors for cursive
31
handwriting recognition. Proceedings of the IEEE World Congress on Computational Intelligence, 28372841. GATOS, B., PRATIKAKIS, I., AND PERANTONIS, S., J. (2006a). Hybrid Off-Line Cursive Handwriting Word Recognition. Proceedings of 18th International Conference on Pattern Recognition (ICPR06), vol. 2, 998-1002. GATOS, B., PRATIKAKIS, I., KESIDIS, A.L., AND PERANTONIS, S.J. (2006b). Efficient off-line cursive handwriting word recognition. Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition. GATOS, B., ANTONACOPOULOS, A., STAMATOPOULOS, N. (2007). ICDAR 2007 Handwriting Segmentation Context. Proceedings of the International Conference on Document Analysis and Recognition, 1284-1288. GHOSH, M., GHOSH, R., AND VERMA, B. (2004). A Fully Automated Offline Handwriting Recognition System Incorporating Rule Based Neural Network Validated Segmentation and Hybrid Neural Network Classifier. International Journal of Pattern Recognition and Artificial Intelligence, 18(7), 1267-1283. GILLIES, M.(1992). Cursive Word Recognition Using Hidden Markov Models. In Proc. Fifth U.S. Postal Service Advanced Technology Conference, 557-562. GILLOUX, M., BERTILLE, J., M., AND LEROUX, M. (1993). Recognition of Handwritten Words in a Limited Dynamic Vocabulary. Proceedings of the Third International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, May 25-27, 417-422. GILLOUX, M. (1993). Hidden Markov Models in Handwriting Recognition, Fundamentals in Handwriting Recognition, S. Impedovo (ed.), NATO ASI Series F: Computers and Systems Sciences, 124, Springer Verlag, New York, 264-288. GILLOUX M, LEROUX M, AND BERTILLE J-M. (1995a). Strategies for Cursive script recognition using Hidden Morkov Models. Machine vision and Application. 197-205.
32
GILLOUX, M., LEMARIE, B., AND LEROUX, M. (1995b). A hybrid radial basis function / Hidden Morkov Model handwritten word recognition system. International conference on document analysis and recognition, Montreal, 394-397. GOVINDARAJU, V., AND SRIHARI, S., H. (1992). Separating Handwritten Text from Interfering Strokes, From Pixels to Features III- Frontiers in Handwriting Recognition, S. Impedovo, J.C. Simon (eds.), North-Holland Publication, 17-28. GRANDIDIER, F. (2003). Un Nouvel Algorithme de Slection de Caractristiques- Application la Lecture Automatique de l'ecriture Manuscrite. PhD thesis, cole de Technologie Suprieure, Montreal-Canada, Janvier. GUILLEVIC, D., AND SUEN, C., Y., (1993). Cursive Script Recognition: A Fast Reader Scheme. Proceedings of the 3rd International Conference on Documents Analysis and Recognition, 311-314. GUILLEVIC, D., AND SUEN, C., Y., (1994). Cursive Script Recognition: A Sentence Level Recognition Scheme. Proceedings of the 4th International Workshop on the Frontiers of Handwriting Recognition, 216-223. GUILLEVIC, D., AND SUEN, C. (1998). HMM-KNN word recognition engine for bank check processing. In Proc. International Conference on Pattern Recognition. 1526-1529. Brisbane. Australia. GNTER, S., AND BUNKE, H (2003). Ensembles of classifiers for handwritten word recognition, International Journal on Document Analysis and Recognition, 5:224-232. GNTER, S., AND BUNKE, H. (2004). Feature selection algorithms for the generation of multiple classier systems and their application to handwritten word recognition. Pattern Recognition Letters, 25(11), 13231336. GNTER, S., AND BUNKE, H. (2005). Off-line cursive handwriting recognition using multiple classifier systems. On the influence of vocabulary, ensemble, and training set size. Optics and Lasers in Engineering, 43(3-5), 437454.
33
GUYON (1996). Handwritten synthesis from handwritten glyphs. 5th International Workshop on Frontiers of Handwriting Recognition, 309-312. HA, T., BUNKE, H. (1997). Off-line handwritten numeral recognition by perturbation method. IEEE Trans. Pattern Analysis and Machine Intelligence, 19 (5), 535-539. HA, T., ZIMMERMANN, M., AND BUNKE, H (1998). Off-line Handwritten Numeral String Recognition by Combining Segmentation-based and Segmentation-free Methods. Pattern Recognition, 31(3), 257272. HAMAMURA, T., AKAGI, T., IRIE, B. (2007). An Analytic Word Recognition Algorithm Using a Posteriori Probability. Proceedings of International Conference on Document Analysis and Recognition. Vol. 02, 669-673. HAN, K., AND SETHI, I., K. (1995). Off-line Cursive Handwriting Segmentation, Proceedings of the 3rd International Conference on Documents Analysis and Recognition, 894-897. HANMANDLU, M., MURALI, K.R.M., CHAKRABORTY, S., GOYAL, S., AND CHOUDHURY, D.R. (2003). Unconstrained handwritten character recognition Based on fuzzy logic. Pattern Recognition, 36(3), 603623. HAYES, K., C. (1980). Reading Handwritten Words Using Hierarchical Relaxation. Computers Graphics and Image Processing, Vol. 14, 344-364. HELMERS, M., BUNKE, H. (2003). Generation and use of the synthetic training data in cursive handwriting recognition. First Iberian Conf. on Pattern Recognition and Image Analysis, 336345. HEUTE L., T. PAQUET, J. V. MOREAU, Y. LECOURTIER, C. OLIVIER (1998). A structural/statistical feature-based vector for handwritten character recognition. Pattern Recognition Letters, 19, 629-641. HO, T.K. (1998). The random subspace method for constructing decision forests. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(8):832-844.
34
HOLT, M., BEGLOU, M., AND DATTA, S. (1992). Slant-Independent Letter Segmentation for Off-line Cursive Script Recognition, From Pixels to Features III, S. Impedovo and J.C. Simon (eds.), Elsevier, page 41. HOWE, N.R., RATH, T.M., AND MANMATHA, R. (2005). Boosted decision trees for word recognition in handwritten document retrieval. Proceedings of the 28th Annual SIGIR Conference on Research and Development in Information Retrieval, 377383. HULL, J. (1998). Document Image Skew Detection: Survey and Annotated Bibliography. In Document Analysis Systems II, World Scientific, 40-64. ISHITANI, Y.(1993). Document Skew Detection Based on Local Region Complexity. Proceedings of 2nd International Conference on Document Analysis and Recognition. Tsukuba Science City, Japan, 4952. INDIRA, K., SELVI, S. (2007). An Off line Cursive Script Recognition System using Fourier-Wavelet Features. International Conference on Computational Intelligence and Multimedia Applications, 506-511. JIANG, T. AND ZHANG, K.S. (2004). Efficient and robust feature extraction by maximum margin criterion. Proceedings of Advances in Neural Information Processing Systems, Vol. 16, 97104. KAVALLIERATOU, E., FAKOTAKIS, N., AND KOKKINAKIS, G. (1999). New Algorithms for Skewing Correction and Slant Removal on Word Level. Proceedings of 6th IEEE International Conference on Electronics, Circuits and Systems. Vol. 2, 1159-1162. KAVALLIERATOU, E., FAKOTAKIS, N., AND KOKKINAKIS, G. (2000a). A Slant Removal Algorithm. Pattern Recognition, 33(7), 1261-1262. KAVALLIERATOU, E., STAMATATOS, E., FAKOTAKIS, N., AND KOKKINAKIS, G. (2000b). Handwritten Character Segmentation Using Transformation-Based Learning. Proceedings of 15th International Conference on Pattern Recognition, Vol 2, 634637.
35
KAVALLIERATOU, E., FAKOTAKIS, N., AND KOKKINAKIS. G., (2001) Slant estimation algorithm for OCR system. Pattern Recognition, Vol. 34(12), 25152522. KAVALLIERATOU, E., DROMAZOU, N., FAKOTAKIS, N., AND KOKKINAKIS, G. (2003). An Integrated System for Handwritten Document Image Processing. International Journal on Pattern Recognition and Artificial Intelligence. 17(4), 617-636. KAPP, M.N., DE ALMENDRA FREITAS, C., AND SABOURIN, R. (2007). Methodology for the design of NN-based month-word recognizers written on Brazilian bank checks. Image Vision Computing, 25(1), 4049. KIM, G., (1996). Recognition of off-line Handwritten Words and its Extension to Phrase Recognition. Ph.D. thesis, State University of New York at Buffalo. KIM, G., AND GOVINDARAJU, V. (1997). A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19 (4), 366-379. KIM, J., H., KIM, K., K., SUEN, C.Y. (2000). An HMM-MLP hybrid model for cursive script recognition. Pattern Analysis and Applications, 3:314-324. KIM, D. (2003). Slant Correction of Handwritten Strings Based on Structural Properties of Korean Characters. Pattern Recognition Letters, No. 12, 2093-2101. KIM, G., GOVINDARAJU, V., AND SRIHARI, S., N. (1999). Architecture for Handwritten Text Recognition Systems. Advances in Handwriting Recognition, 163-182. KIM, G., GOVINDARAJU, V., ECIENT (1996). Chain-code-based Image Manipulation for Handwritten Word Recognition. Proceedings of SPIE-The International Society for Optical Engineering, Bellingham, WA, USA, Vol. 2660, 262-272. KIM, G. AND GOVINDARAJU, V. (1997). A Lexicon Driven Approach to Handwritten Word Recognition for Real Time Application. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (4): 366379.
36
KIM, K. K., KIM, J. H., SUEN, C. Y(2002).Segmentation-based Recognition of Handwritten Touching Pairs of Digits using Structural Features. Pattern Recognition, 23(1), 13-21. KIMURA, F., TSURUOKA, S., SHRIDHAR, M., AND CHEN, Z. (1992). Context-Directed Handwritten Word Recognition for Postal Service Applications. Proceedings of the 5th USPS Advanced Technology Conference, 199-213. KIMURA, F., SHRIDHAR, M., AND NARASIMHAMURTHI, N. (1993).Lexicon-Directed Segmentation- Recognition Procedure for Unconstrained Handwritten Words. Proceedings of the Third International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, May 25-27, 122-131. KIMURA, F., SHRIDHAR, M., AND CHEN, Z. (1993). Improvements of a Lexicon Directed Algorithm for Recognition of Unconstrained Handwritten Words. Proceedings of the second International Conference on Document Analysis and Recognition, Tsukuba, Japan, 18-22. KNERR, S., ANISIMOV, V., BARET, O., GORSKI, N., PRICE D., AND SIMON, J.C. 1997.The A@IA Inter-check system. Courtesy amount and legal amount recognition for French Checks. Automatic bank cheque processing 43-86. KNERR, S., AUGUSTIN, E., BARET, O., PRICE, D. (1998). Hidden Markov Model Based Word Recognition and its Application to Legal Amount Reading on French Checks. Computer Vision and Image Understanding, 70 (3), 404-419. KOCH, PAQUET, T., HEUTTE, L. (2004). Combination of Contextual Information for Handwritten Word Recognition. 9th International Workshop on Frontiers in Handwriting Recognition, Kokubunji, Tokyo, Japan, 468-473. KOERICH, A., L., SABOURIN, R. AND SUEN, C., Y. (2003) Large Vocabulary Off-line Handwriting Recognition: A Survey. Pattern Analysis and Application, 6(2), 97-121. KOERICH, L., SABOURIN, R., SUEN, C-Y. (2004) Fast TwoLevel HMM Decoding Algorithm for Large Vocabulary Handwriting Recognition, 9th International Workshop on Frontiers in
37
Handwriting Recognition, October 26-29, Kokubunji, Tokyo, Japan, 232-238. KOERICH, A.L., SABOURIN, R., AND SUEN, C.Y. (2005). Recognition and verification of unconstrained handwritten words. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 15091522. KOERICH, A.L., BRITTO, A., OLIVEIRA, L.E.S., AND SABOURIN, R. (2006). Fusing high- and low-level features for handwritten word recognition. Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition. KITTLER, J., HATEF, M., DUIN, R., AND MATAS, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226-239. KUNDU, Y. HE, M. CHEN (2002). Alternatives to variable duration HMM in handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1275-1280. LALLICAN P.M. AND VIARD-GAUDIN C.(1998). Off-line handwriting Modeling as a Trajectory tracking Problem. International workshop on frontiers in handwriting recognition, IWFHR6, Taejon, Korea, 347-356. LECOLINET, E., AND CRETTEZ, J-P. (1991). A Grapheme-Based Segmentation Technique for Cursive Script Recognition. Proceedings of the 1st International Conference on Document Analysis and Recognition, St Malo, France, 740-748. LEE, L., AND COELHO, S. (2005). A simple and efficient method for global handwritten word recognition applied to Brazilian bank checks. Proceedings of the 8th International Conference on Document Analysis and Recognition, 950955. LIOLIOS, N., FAKOTAKIS, N., AND KOKKINAKIS, G. (2002). On the Generalization of the Form Identification and Skew Detection Problem. Pattern Recognition 35, 253264. LIU, J., AND GADER, P. (2002). Neural Networks with Enhanced Outlier Rejection Ability for Off-line Handwritten Word Recognition. Pattern Recognition, 35: 20612071. LIU, C.-L, NAKASHIMA, K., SAKO, H., AND FUJISAWA, H. (2002). Handwritten digit recognition using state-of-the-art techniques.
38
In Proc. of 8th International Workshop on Frontiers of Handwriting Recognition, 320325. LIU, C-L, NARUKAWA, K. (2004). Normalization Ensemble for Handwritten Character Recognition. 9th International Workshop on Frontiers of Handwriting Recognition, 69-74. LIU, C-L., AND FUJISAWA, H. (2005). Classification and learning for character recognition: Comparison of methods and remaining problems. Proceedings of the International Workshop on Neural Networks and Learning in Document Analysis and Recognition, 57. LONCARIC. (1998). A Survey of Shape Analysis Techniques. Pattern Recognition, Vol. 31(8), 983-1001. LORETTE (1999). Handwriting recognition or reading? What is the situation at the dawn of the 3rd millennium? International Journal on Document Analysis and Recognition, Vol. 2, 2-12. LU, Y. (1995). Machine printed character segmentation-An overview. Pattern Recognition, 28(1), 6780. LU, Y., AND SHRIDHAR, M. (1996). Character Segmentation in Handwritten Words - An Overview, Pattern Recognition vol. 29, 77-96. MADHVANATH, S., KLEINBERG, E., AND GOVINDARAJU, V. (1999). Holistic Verification of Handwritten Phrases. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 1344-1356. MADHVANATH, S., AND SHRIHARI, S. (1996). A Technique for Local Baseline Determination, Proceedings of the 5th International Workshop on Frontiers in Handwriting Recognition, 445-448. MADHVANATH, S., AND GOVINDARAJU, V. (2001).The Role of Holistic Paradigms in Handwritten Word Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (2). MAIER, M. (1986).Separating Characters in Scripted Documents, Proceedings of the 8th International Conference on Pattern Recognition, Paris, 1056-1058.
39
MANTAS, J. (1986). An Overview of Character Recognition Methodologies, Pattern Recognition. 19, 425-430. MARTI, U., AND BUNKE, H. (2001). Using a Statistical Language Model to improve the performance of an HMM-based Cursive Handwriting Recognition System. International Journal of Pattern Recognition and Artificial Intelligence.15 (1), 65-90. MARTI, U., AND BUNKE, H. (2002). The IAM database: An English Sentence Database for Off-line Handwriting Recognition. International Journal of Document Analysis and Recognition, 15, 65-90. MARTIN, G.L., RASHID, M., & PITTMAN, J.A. (1993). Integrated segmentation and recognition through exhaustive scans or learned saccadic jumps. International Journal on Pattern Recognition and Artificial Intelligence, 7(4), 831847. MARINAI, S., GORI, M., AND SODA, G. (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 2335. MILGRAM, J., CHERIET, M., SABOURIN, R. (2004). Speeding up the decision making of Support Vector Classifier. Proceedings of 9th International Workshop on Frontiers in Handwriting Recognition, 57-62. MOHAMED, M. A., AND GADER, P., (1996). Handwritten word recognition using segmentation-free hidden Morkov modeling and segmentation-based dynamic programming techniques. IEEE Transactions on Pattern Analysis and Machine intelligence, 18(5):548-554. MOHAMED M.A., AND GADER, P. (2000) Generalized Hidden Morkov Models- part ii: Application to handwritten word recognition. IEEE Transactions on Fuzzy Systems. (8), 82-94. MORI, S., SUEN, C., Y., AND YAMAMOTO, K. (1992). Historical Overview of OCR Research and Development, Proceedings of the IEEE, 80, 1029-1058. MORI, M., SUZUKI, A., SIHO, A., OHTSUKA, S. (2000). Generating new samples from handwritten numerals based on point
40
correspondence. 7th International Workshop on Frontiers of Handwriting Recognition, 281-290. MORITA, M., FACON, J., BORTOLOZZI, F., GARNES, S., AND SABOURIN, R. (1999) Mathematical morphology and weighted least squares to correct handwriting baseline skew, in Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, Bangalore, pp.430433. MORITA, M., OLIVEIRA, L. S. SABOURIN, R. (2004). Unsupervised Feature Selection for Ensemble of Classifiers. IWFHR-9, pp.81-86. NADIA, A., AND NAJOUA, E. (2006) Combining a hybrid Approach for Features Selection and Hidden Markov Models in Multifont Arabic Characters Recognition. Proceedings of the Second International Conference on Document Image Analysis for Libraries (DIAL06), 103-107. NICCHIOTTI, G., AND SCAGLIOLA, C. (1999). Generalised Projections: A Tool for Cursive Handwriting Normalization. ICDAR99, Proceedings of 5th International Conference on Document Analysis and Recognition, Bangalore India, 729-732. NICCHIOTTI, G., SCAGLIOLA, C. AND RIMASSA, S. (2000). A Simple and Effective Cursive Word Segmentation Method. Proceedings of the 7th International Workshop on Frontiers in Handwriting Recognition, September, Amsterdam, ISBN 9076942-01-3, Nijmegen: International Unipen Foundation, pp. 499-504. NISHIMURA, M. KOBAYASHI, M. MARUYAMA, Y. NAKANO (1999). Offline character recognition using HMM by multiple directional feature extraction and voting with bagging algorithm. Proceedings of 5th International Conference on Document Analysis and Recognition, 4952. OKUN, O., PIETIKAINEN, M., AND SAUVOLA, J. (1999). Robust Skew Estimation on Low-Resolution Document Images. 5th International Conference on Document Analysis and Recognition, 621.
41
OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2002). Automatic Recognition of Handwritten Numerical Strings: A Recognition and Verification Strategy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(11), 1438-1454. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2003a). A methodology for feature selection using multiobjective genetic algorithms for handwritten digit string recognition. International Journal on Pattern Recognition and Artificial Intelligence, 17(6):903-930. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2003b). Feature Selection for Ensembles: A Hierarchical Multi-Objective Genetic Algorithm Approach, 7th International Conference on Document Analysis and Recognition, vol. 2, 676-680. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2003c). Impacts of verification on a numeral string recognition system. Pattern Recognition Letters, 24(7):10231031. OLIVEIRA, L., S., SABOURIN, R. (2004). Support Vector Machines for Handwritten Numerical String Recognition, 9th International Workshop on Frontiers in Handwriting Recognition, October 26-29, Kokubunji, Tokyo, Japan,3944.Computer Society Press. OLIVIER, C. PAQUET ,T., AVILA, M., AND LECOURTIER, Y. (1995). Recognition of handwritten words using Stochastic Models. International conference on document analysis and recognition, 19-23. OPTIZ, D.W. (1999). Feature Selection for Ensembles. 16th Int. Conf. on Artificial Intelligence, pages 379-384. OTSU, N. (1979). A Threshold Selection Method from Gray level Histograms, IEEE Trans. on Systems, Man and Cybernetics 9(1), 62-66.
42
PAL, U., BELAID, A., CHOISY, C. (2003). Touching Numeral Segmentation using Water Reservoir Concept. Pattern Recognition Letters, 24:261-272. PARTRIDGE, D., YATES, W.B. (1996). Engineering Multiversion Neural-net Systems. Neural Computation, 8(4):869893. PASTER, M., TOSELLI, A., AND VIDAL, E. (2004). Projection Profile Based Algorithm for Slant Removal. Proceedings of the International Conference on Image analysis and Recognition, 183-190. PINALES RUIZ. J, JAIME-RIVAS, R, CASTRO, M.J (2007). Discriminative capacity of perceptual features in handwriting recognition. Telecommunications and Radio Engineering, 64 (11), 931-937. PLAMONDON, R., AND SRIHARI, S., N. (2000). On-Line and OffLine Handwriting Recognition: A Comprehensive Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 63-84. PUDIL, P., NOVOVICOVA, J., AND KITTLER, J.(1994). Floating search methods in handwritten material feature selection. Pattern Recognition Letters, vol. 15, 1119-1125. POSTL, W. (1986). Detection of Oblique Structures and Skew Scan in Digitized Documents. Proceedings of International Conference on Pattern Recognition. 687- 689. PROCTER, S., ELMS, A. J. (1998). The Recognition of Handwritten Digit Strings of Unknown Length using Hidden Markov Models. Proceedings of the Fourteenth International Conference on Pattern Recognition (ICPR'98), 1515-1517. PROCTER, S., AND ILLINGWORTH, J. (1999). Handwriting recognition using HMMs and a conservative level building algorithm. In Proc. 7th International Conference on Image Processing and its Applications. 736-739. Manchester. RABINER, L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-286.
43
RAMESH, D.,R., PIYUSH, M., K., AND MAHESH, D., D.(2006) Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition, Las Vegas, Nevada, USA, June 26-29, Vol 2, 510-515. SAMRAJYA, P., LAKSHMI, M., HANMANDLU AND SWAROOP, A. (2006). Segmentation of Cursive Handwritten Words using Hypergraph, 1-4, TENCON, IEEE region 10 Conference. SARFRAZ, M., ZIDOURI, A., AND SHAHAB, S., A. (2005). A Novel Approach for Skew Estimation of Document Images in OCR System. Proceedings of the Computer Graphics, Imaging and Vision: New Trends (CGIV05) IEEE SARFRAZ, M., MAHMOUD, S., A., AND RASHEED, Z. (2007). On Skew Estimation and Correction of Text. Proceedings of international Conference on Computer Graphics, Imaging and Visualization, 308-313. SAYRE, K., M. (1973). Machine Recognition of Handwritten Words: A Project Report. Pattern Recognition, 5, 213-228. SCAGLIOLA, C., AND NICCHIOTTI, G (2000). Enhancing cursive word recognition performance by integration of all the available information. In Proc. 7th International Workshop on Frontiers in Handwriting Recognition, 363-372. Amsterdam. Netherlands. SCHAMBACH, M.-P. (2005). Fast script word recognition with very large vocabulary. Proceedings of the 8th International Conference on Document Analysis and Recognition, 913. SENIOR, A., W. (1994). Off-Line Cursive Handwriting Recognition Using Recurrent Neural Networks, PhD Dissertation, University of Cambridge, England. SENIOR, A., W., AND ROBINSON, A., J. (1998). An Off-line Cursive Handwriting Recognition System, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 309-321.
44
SENIOR, W., ROBINSON, A.J. (2002). An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):309-321. SERRA, J. (1982). Image Analysis and Mathematical Morphology, Academic Press, London. SHRIDHAR, M., KIMURA, F. (1995). Handwritten Address Interpretation using Word Recognition with and without Lexicon. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Piscataway, NJ, USA, Vol. 3, 2341-2346. SIMON, J., C. (1992). Off-Line Cursive Word Recognition, Proceedings of the IEEE, 80, 1150-1161. SIMONCINI, L., AND KOVACS-V, ZS. M. (1995). A System for Reading USA Census90 Hand-Written Fields. Proceedings of 3rd International Conference on Document Analysis and Recognition Vol. II. 8691. SIN, B.K. AND KIM, J.H. (1997). Ligature Modeling for Online Cursive Script Recognition, IEEE Trans. on PAMI 19(6), 623633. SINGH, S., AND AMIN, A. (1999). Neural network recognition of hand printed characters. Neural Computing & Applications, 8(1), 6776. SINHA, R., M., K., PRASADA, B., HOULE, G., AND SABOURIN, M (1993). Hybrid Contextual Text Recognition with String Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 915- 925. SMITH, L., I. (2002). A tutorial on Principal Components Analysis, 26thFebruary 2002, {http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring2003/PCA-tutorial} SRIHARI, S., N., AND LAM, S., W. (1995). Character Recognition, Technical Report, CEDAR-TR-95-1. STEINHERZ, T., INTRATOR, N., AND RIVLIN, E. (1999), Skew Detection via Principal Components Analysis. Proceedings of the 5th International Conference on Document Analysis and Recognition, 153156.
45
STEINHERZ, T., RIVLIN, E., AND INTRATOR, N. (1999). Off-line Cursive Script Word Recognition- A survey. International Journal of Document Analysis and Recognition, Vol. 2, 90-110. STEVENS, M., E. (1961). Automatic Character Recognition: A State-of-the-Art Report, National Bureau of Standards. Unipen Foundation, 499-504, 2000 STEFANELLI, R. AND ROSENFELD, A. (1971). Some Parallel Thinning Algorithms for Digital Pictures, Journal of the Association for Computing Machinery ,vol. 18, 255-264. STEFANO, C., D., AND MARCELLI, A. (2002). From Ligatures to Characters: A Shape-based Algorithm for Handwriting Segmentation. Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR02), 473478. SUEN, C., Y., LEGAULT, R., NADAL, C., CHERIET, M., AND LAM, L. (1993). Building a New Generation of Handwriting Recognition Systems, Pattern Recognition Letters, 14, 305-315. SUEN, C.Y., AND TAN, J. (2005). Analysis of errors of handwritten digits made by a multitude of classifiers. Pattern Recognition Letters, 26(3), 369379. SUEN, C.Y., LEGAULT, R., NADAL, C., CHERIET, M., AND LAM, L. (1993). Building a new generation of handwriting recognition systems. Pattern Recognition Letters, 14(4), 305315. TAIRA, E., UCHIDA, S., AND SAKOE, H. (2004). Non-uniform slant correction for handwritten word recognition, IEICE Transactions on Information and Systems, vol. E87-D, no. 5,12471253. TAPPERT, C., C., SUEN, C, Y., AND WAKAHARA, T. (1990). The State of the Art in On-line Handwriting Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 8, 787-793. TAKAHASHI, T. GRIFFIN (1993). Recognition enhancement by linear tournament verification. Proceedings of 2nd International Conference on Document Analysis and Recognition, 585-588.
46
TAY, Y., H. (2002). Offline Handwriting Recognition using Artificial Neural Network and Hidden Morkov Model. PhD thesis, Page 78. TAY, Y., H., KHALID, M., YUSOF, R., AND GAUDIN C.,V. (2003) Offline Cursive Handwriting Recognition System based on Hybrid Markov Model and Neural Networks, Proceedings of IEEE International Symposium on Computational Intelligence in Robotics and Automation, Kobe, Japan, 1190-1195. TEOW, L., N., LOE, K., F. (2002). Robust vision-based features and classification schemes for off-line handwritten digit recognition. Pattern Recognition 35 (11), 2355-2364. TOMOYUKI, H., TAKUMA. A., BUNPEI. I. (2007) An Analytic Word Recognition Algorithm using a Posteriori Probability. Proceedings of the 9th International Conference on Document Analysis and Recognition, vol. 02, 669-673. TRIER, O.D., JAIN, A.K., TAXT, T. (1996). Feature Extraction Methods for Character recognition- a Survey. Pattern Recognition, vol 29 , no.4, 641-662. UCHIDA, S., TAIRA, E., AND SAKOE, H. (2001). Non-uniform slant correction using dynamic programming, Proceedings of 6th International Conference on Document Analysis and Recognition, vol. 1, 434438. VALENTINI, DIETTERICH, T. G. (2002). Bias-Variance Analysis and Ensembles of SVM. 3rd International Workshop on Multiple Classifier Systems, 222-231. VARGA, T., AND BUNKE, H. (2003). Generation of Synthetic Training Data for an HMM-based Handwriting Recognition System. In Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 618622. VINCIARELLI, A. (2002). A survey on off-line cursive word recognition. Pattern Recognition, 35(7), 14331446. VELOSO, L., R., SOUSA, R., P., DE AND CARVALHO, J., M. (2000). Morphological Cursive Word Segmentation. Symposium on Computer Graphics and Image Processing, 2000. XIII, Brazilian. 337-342.
47
VERMA, B., AND BLUMENSTEIN, M. (1996). An Intelligent Neural System for a Robot to Recognize Printed and Handwritten Postal Addresses. Proceedings of Fourth IASTED International Conference on Robotics and Manufacturing, IASTED RM96, Hawaii, USA, 80-84. VERMA, B., BLUMENSTEIN, M., AND KULKARNI, S. (1998). Recent Achievements in Off-line Handwriting Recognition Systems. Proceedings of the Second International Conference on Computational Intelligence and Multimedia Applications, (ICCIMA 98), Gippsland, Australia, 27-33. VERMA, B., GADER, P. (2000). Fusion of Multiple Handwritten Word Recognition Techniques. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, vol.2, 926-934. VERMA, B.K., GADER, P., AND CHEN, W (2001). Fusion of Multiple Handwritten Word Recognition Techniques, Pattern Recognition Letters, 22(9), 991-998 VERMA, B. (2002). A Contour Character Extraction Approach in Conjunction with a Neural Confidence Fusion Technique for the Segmentation of Handwriting Recognition. Proceeding of the 9th International Conference on Neural Information Processing, vol. 5, 2459-2463. VERMA, B. (2003). A Contour Code Feature Based Segmentation for Handwriting Recognition. Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR03), 1203-1207. VIARD-GAUDIN, C., LALLICAN, P.-M., AND KNERR, S. (2005). Recognition-directed recovering of temporal information from handwriting images. Pattern Recognition Letters, 26(16), 25372548. VINCIARELLI, A. AND LUETTIN, J. (1999). Off-line cursive script recognition: IDIAP Research Report IDIAP-RR 00-43. VINCIARELLI, A., AND LUETTIN, J. (2001). A New Normalization Technique for Cursive Handwritten Words. Pattern Recognition Letters, 22, 1043-1050.
48
VINCIARELLI, A. (2002). A survey on off-line cursive word recognition. Pattern Recognition, 35(7), 14331446. VUURPIJL, L., SCHOMAKER, L. VAN, M. (2003). Architectures for detecting and solving conflicts: two-stage classification and support vector classifiers, Int. Journal on Document Analysis and Recognition, 5(4):213-223. WATANABE, M., M., HAMAMMOTO, Y., YASUDA, T., AND TOMITA, S. (1997) Normalization Techniques of Handwritten Numerals for Gabor Filters. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR IEEE, Los Alamitos, CA, Vol. 1, 303-307. WANG, D., SRIHARI, S. N. (1991). Analysis of Form Images. Proceeding of the International Conference on Document Analysis and Recognition, 181-186. WANG, L., WANG, X., AND FENG, J. (2006). On Image Matrix Based Feature Extraction Algorithms. IEEE Transactions on Systems Man and Cybernetics: Cybernetics Vol. 36(1), 194197. WANG, X., DING, X., AND LIU, C. (2005). Gabor filters based feature extraction for character recognition. Pattern Recognition, 38(3), 369379. WEN, Y., LU, Y., AND SHI, P. (2007). Handwritten Bangla numeral recognition system and its application to postal automation. Pattern Recognition, 40(1), 99107. XIAO, X., AND LEEDHAM, G, (2000). Knowledge-based English Cursive Script Segmentation, Pattern Recognition Letters, 21, 945-954. XU, Q., LAM, L., AND SUEN, C.Y. (2003). Automatic segmentation and recognition system for handwritten dates on Canadian bank cheque. In M. Fairhurst, and A. Downton (Eds.), Proceedings of the 7th International Conference on Document Analysis and Recognition, 704709. YAMADA, H., AND NAKANO, Y. (1996). Cursive Handwritten Word Recognition Using Multiple Segmentation Determined by Contour Analysis. IEICE Transactions on Information and Systems, E79-D, 464-470.
49
YANIKOGLU, B., AND SANDON, P., A. (1998). Segmentation of OffLine Cursive Handwriting using Linear Programming, Pattern Recognition, 31, 1825-1833. YONG J., Y., KIM, M., K., BANA, S., W., AND KWON, Y., B. (1997). Line Removal and Restoration of Handwritten Characters on the Form Documents. Proceedings of the 4th International Conference on Document Analysis and Recognition, 18-20, Vol. 1, 128-131. YU, B., JAIN, A.K. (1996). A generic system for form dropout, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 11, 1127-1132. ZEEUW, F., D. (2006). Slant Correction Using Histogram. Bachelor thesis page:3-4 ZHANG, D., AND LU, G., (2004). Review of shape representation and description techniques. Pattern Recognition, Vol. 37, 1-19. ZHOU, J., GAN, Q., KRZYYZAK, A., SUEN. C-Y., (2000). Recognition and verification of touching handwritten numerals. 7th International Workshop on Frontiers of Handwriting Recognition, 179-188. ZIMMERMANN, M., BUNKE, H. (2002). Hidden Markov model length optimization for handwriting recognition systems. International Workshop on Frontiers in Handwriting Recognition, Niagara-on-the-Lakes, 369-374.
2
SEGMENTATION-BASED OFFLINE CURSIVE HANDWRITING RECOGNITION: CURRENT UPDATES AND RECENT ADVANCES
Amjad Rehman Dzulkifli Mohamad
INTRODUCTION Segmentation of handwriting is an operation that seeks to decompose a word image into sub-images of individual characters therefore it is important to understand what a character is (Casey and Lecolinet, 1996). Segmentation is a difficult and error prone process because of the Sayre's paradox (1973): a character cant be segmented before having been recognized and cannot be recognized before having been segmented. It seems that the character segmentation process requires that the properties of a character be known; this information may be obtained through recognition. Unfortunately, to obtain knowledge of a character's appearance, segmentation is required. Therefore it is obvious that one stage is dependent on the other and knowledge of character symbol structure in a word is helpful in segmentation. Upon scrutinizing the literature, it is not difficult to observe that there are far more studies on the segmentation and recognition of cursive script, than there are for hand-printed words. Several review papers
52
highlighted different issues in cursive handwriting segmentation and acknowledged the segmentation stage as the most difficult step in the process of cursive handwriting recognition (Casey and Lecolinet, 1996; Dunn and Wang, 1992; Lu, 1995; Lu and Shridhar, 1996; Elliman and Lancaster; 1990; Fujisawa, et al., 1992; Steinherz et al., 1999; Plamondon and Srihari, 2000; Blumenstein and Verma, 2001; Vinciarelli, 2002; Gang et al., 2002; Koerich et al., 2003; Bortolozzi et al., 2005). To examine some of the most pioneering research in segmentation and recognition of cursive script, one must venture back to the early 60s. This chapter hopefully meets the objective of outlining a good representative of the techniques that have been proposed over the last five decades. A major problem in discussing segmentation is how to classify methods. Firstly, analytical methods that employ segmentation-based recognition strategies where the segmentation can be explicit (El-Yacoubi et al., 1999; Arica and Yarman-Vural, 2002) or implicit (Gillies, 1992; Cho, 1995).Explicit when the segmentation is based on cut rules, and implicit when each pixel column is a potential cut location. In the case of explicit segmentation several algorithms have been proposed during the last few decades. They normally take into consideration a set of heuristics and information of the foreground pixels (Kim et al., 2002) and background pixels (Pal et al., 2003), or a combination of both (Oliveira et al., 2002) in order to generate potential segmentation cuts. Generally, the heuristics used to make the algorithm robust make it specific for the applied problem, and a good segmentation algorithm for numeral strings may not have the same performance for words, and vice-versa. Secondly, Tappert et al., (1990) classified into "external" vs. "internal" segmentation, depending on whether recognition is required in the process. Dunn and Wang (1992) termed as "straight segmentation" and "segmentationrecognition". After reviewing available literature, we have concluded that there are two main strategies for segmentation, plus
Segmentation-based Offline Cursive Handwriting Recognition: Currents Updates and Recent Advances
53
numerous hybrid approaches that are combinations of first two. The basic strategies are: 1. Explicit based segmentation approach, in which segments are identified based on "character-like" properties. This process of cutting up the image into meaningful components is given a special name, "dissection", in discussions below. 2. Implicit based segmentation, in which the system searches the image for components that match classes in its alphabet. 3. Holistic approach, where each word in the lexicon should be built in a model, generally applies to small and fix lexicon. Based on fundamental strategies, review of the segmentation and recognition strategies is organized as shown in figure 1.
Figure 1
Hierarchy of word segmentation/recognition methods. (Reprinted from Cheriet et al., 2007)
54
DISSECTION APPROACH
BASED
SEGMENTATION/
CLASSICAL
One of the first techniques for segmenting handwritten cursive script was proposed by Frishkopf and Harmon, (1961). The authors introduced the concept of analyzing cursive words in terms of upper, middle and lower zones. They constrained their writers so that they would write such letters as a, c, e etc. between a baseline (sometimes referred to as the lower baseline) and a parallel guideline above it (upper baseline). They would then analyze the area between the upper baseline and the line delineating the top of the word, to locate ascenders in the text (upper loops from letters such as b and d). They also scanned between the lower baseline and the line delineating the bottommost area of the word to locate descenders (lower loops, located in such letters as g and j). Using ascenders/descenders information they were able to locate important features in each word that could assist in the segmentation process. Some of the features that were found included: vertical extremes about the baseline and special marks. Other features that were detected did not rely on ascenders/descenders information. These included retrograde strokes (strokes that go from left to right in various letters). The authors then proceeded to estimate the average character width of each word by counting the number of vertical crossings about the centre-axis (the line located between the upper and lower baseline). They used the features described above along with the average character width information to locate segmentation points. Since Frishkopf and Harmons study, many researchers have included base-line location as an integral part of their handwriting recognition systems. The next system to be discussed was one of the first to employ dissection via pre-segmentation". Pre-segmentation is an idea of dissecting words in areas containing specific features that are likely to occur within or between characters. An example of a character that may be divided could be a u, as its contour is
55
comprised of a valley that may be interpreted as a ligature (a connection between characters in cursive writing). The shapes that remain as a result of pre-segmentation are sometimes called "graphemes" or "pseudo-characters" (Casey and Lecolinet, 1996) or junks (Tay, 2002). To allow efficient post-processing, a grapheme is required to satisfy certain criteria. A character should be defined by no more than two graphemes, and a grapheme should represent no more than two or three characters. Following pre-segmentation, a contextual post-processing stage may be used to merge or split the graphemes so that the result is a set of accurately matched symbols. In the system proposed by Sayre (1973), words were segmented into graphemes based on specific characteristics or features located in word images. A tentative classifier was used to group the graphemes into one of 17 non-exclusive categories that corresponded to letters or groups of letters. This was followed by final identification based on a statistical decision tree classifier. Sayres system was the first (other than that described by Earnest, 1962) to be based on features that were not dependent on stroke sequence (information inherently available in on-line handwriting recognizers). Sayres research may be the first to draw the distinction between off-line and online cursive handwriting recognition. Sayre achieved a 79% word recognition rate on 84 words. The segmentation of cursive writing into primitives followed by some sort of contextual processing has been quite popular amongst researchers Ehrich and Koehler (1975), Maier (1986), Lecolinet and Crettez (1991). The algorithms proposed by Maier (1986), Lecolinet and Crettez (1991) are mainly based on the detection of the valleys of the upper profile of the word and do not use further information about the actual shape of the ligatures. These techniques, because of their extreme simplicity, are prone to erroneous ligature detection, such as, in case of not actually closed loops or when a valley occurs inside a character. The rules that are used to dissect handwritten script are based on heuristics that are in turn based on a visual analysis of the handwritten word. As mentioned previously,
56
many systems have focused on the location of ligatures between letters in cursive text. Some systems focus on ligatures that are close to the baseline, unfortunately problems are inherent with certain characters that do not contain ligatures close to the baseline such as o, b, v etc. Other problems occur when certain characters themselves contain components that resemble ligatures. These and other reasons relating to the ambiguity of cursive text are the reasons why it is necessary to employ contextual processing following the initial dissection. Ligatures may be detected by locating minima in the upper contour of words as presented by Holt et al., (1992). Other than the upper contour information, the authors used a set of rules based on contour direction, the location of holes and upper and lower baselines. As an example, segmentation points were marked if a minimum in the upper contour was located, except if the contour component in question formed part of a hole. The upper contour was also used by Kimura et al., (1993) to determine segmentation points in handwritten postal words as part of an entire hybrid recognition system. Bozinovic and Srihari (1989) attempt to locate possible segmentation points based on proximity to minima in the lower contour and other rules that force segmentations in areas that are between two distant segmentation points. A technique proposed by Cheriet (1993) for extracting "key letters" in cursive script analyses face-up and face-down valleys along with open loop regions. Cheriet employs background analysis to achieve segmentation. Han and Sethi (1995) proposed an algorithm for segmentation of handwritten words based on a number of features such as crossing points, loops, concave and convex points. They reported that 50 real-world address images were segmented with an accuracy of 85.7%. Yamada and Nakano (1996) proposed cursive word segmentation method of using multiple candidate points to determine the most suitable points based on contour features. Reasonable recognition rates were obtained when the segmentation algorithm was used as part of a complete word
57
recognition system. The top word recognition rate was 91.7% with lexicon size 50. Kim et al., (1997) proposed a segmentation method that uses a combination of ligatures and concavity features on the contour to determine the segmentation points. The number of segmentation points is kept to a minimum while ensuring that a segmentation point exists to split touching characters and the maximum number of segmentation points per character is four. They have listed 96.8% recognition rate for small lexicon size. Eastwood et al., (1997) trained a neural network with feature vectors representing Possible Segmentation Points (PSPs) and compliment features that represented the absence of segmentation point. The feature vectors were manually obtained from training and test words in the CEDAR benchmark database. The accuracy of the network on a test set of PSPs was 75.9%. Bretto et al., (1997, 2002) proposed a segmentation algorithm using hypergraph theory. They employ hypergraph theory for developing image-processing applications such as segmentation. However the segmentation they have dealt with is for picture images and not for cursive handwritten words. Favata (1997), proposed a system to segment and recognize characters of the word based on character model word recognition approach. A segmentation algorithm over-segmented the word at possible segmentation points. Possible character interpretations of strokes between two segmentation points were produced by using a window based scanning. Scanning procedure used to extract strokes by examining a word from left to right and sending them to an OCR system. All possible combinations of the word are represented by constructing an augmented, directed graph. A beam search matching algorithm was used in conjunction with a lexicon constrains the graph to locate the best possible match. A recognition accuracy of 81% is reported for the top choice using a 1000 word lexicon. Yanikoglu and Sandon (1998) developed a system that was used to locate successive segmentation points in cursive script by evaluating a cost function at each point on the baseline. The
58
decision to segment at a particular point and angle is determined by the weighted sum of four features that pertain to global characteristics of the writing. These weights are obtained by performing linear programming. The authors employ the name "style parameters" to describe the features used in their analysis. They pertain to such information as the average character width, pen thickness and distance from the previous segmentation point. Accuracy up to 92% was reported on their practice database of words. Dimauro et al., (1998) proposed an advanced approach for segmenting cursive words as part of a recognition system to read the amounts on Italian bank cheque. The segmentation technique is based on a hypothesis-then-verification strategy. Initially the entire word image is searched, and connected components are located within the image. Each "block" detected via this process is passed to a recognizer. If the block is rejected, a hypothesis is generated to split the block by using a "drop falling" algorithm. The algorithm employs a number of rules that analyze the background of the image to determine the first cutting point. They then employ a descending procedure that simulates a "drop-falling" process. The dropping procedure is guided by rules that take into account neighboring pixels and a regional analysis of the upper contour to form an appropriate segmentation path. The hypothesis is then verified by classifying the strokes that have originated as a result of segmentation. A nearest neighbor technique is employed for this process. If the stroke is classified with high confidence the segmentation hypothesis is accepted. Otherwise, a different hypothesis is considered. Segmentation accuracy results were not presented by the authors, but indicated that the new approach improved the recognition of cursive words on bank checks by 6%. Blumenstein and Verma reported a feature-based, heuristic segmenter (Verma and Blumenstein, 1996; Blumenstein and Verma, 1997). Following that they integrated with ANN for the validation of segmentation points (Verma 1998; Blumenstein and Verma, 1998a; Blumenstein and Verma, 1998b; Blumenstein and Verma 1998c) for real-world handwritten words. The heuristic
59
algorithm was used to over-segment each handwritten word whereby a neural-based validation technique applied to verify whether each segmentation point was "valid" or "invalid". However, the segmentation of all characters was based on the size of segregated character, if there is no segregated character in the cursive script then whole process does not seem to be successful. Likewise, the accuracy of the estimate of character segmentation (based on size of segregated character) is not satisfactory for offline script (Madhavanath and Govindaraju, 2001). Hence heuristic algorithm could not deal with the varying sizes of characters and words input to the system. Blumenstein and Verma (1999a, 1999b) extend the previous segmentation technique by adding contour feature extraction module to detect valleys based on immediate slope change and vertical pixel density to add more segmentation points. Nevertheless, the technique could not deal with segmentation of horizontally overlapped characters. Finally, all features were integrated into a fully functional, word recognition system (Blumenstein and Verma, 1999a, b). Xiao and Leedham (2000) proposed a knowledge-based technique for cursive word segmentation. They identify prominent structures in English letters as well as analyzing face-up and facedown background transitions. They locate connected components consisting of more than one letter and over-segment them based on the face-up and face-down background regions. These oversegmented subcomponents are then merged based on their joining properties and knowledge derived from the structure of characters. The authors reported a correct segmentation rate of 82.9% on a subset of words from the CEDAR database and 78.3% correct segmentation rate on a self collected dataset. Nicchiotti et al., (2000) developed a simple but effective segmentation algorithm. The algorithm consists of three main steps: (1) possible segmentation points detection; (2) determining the cut direction; and (3) merging of over-segmented strokes to one character with the help of some heuristic rules. The authors reported results of 86.9% on a subset of words from the CEDAR database.
60
Veloso et al., (2000) hypothesized character segmentation based on natural segmentation point and ligature. Natural segmentation is a character that not connected which analyzed using histogram projection taken from five different angles. Ligature candidate obtained from morphological operations of opening and closing. While to search for the best structuring elements to determine ligature in the set of training words applied genetic algorithm. Finally Viterbis algorithm used to determine qualified ligature. Figure 2 shows word segmentation example using this technique.
Figure 2
Example of word segmentation (reprinted from Veloso et al., 2000)
Verma and Gader (2000) put forward a hybrid approach by combining three script recognition techniques (MUMLP, GUMLP and MURBF) in conjunction with back propagation and radial based function neural network. Two different segmentation techniques were adopted for each experiment Modification of conventional borda count, base on rank and confidence (cf) value, used to combine these three techniques. MBC = (rank*weight*cf) tech1 + (rank*weight*cf) tech2 + (rank*weight*cf) tech3
61
MUMLP is over segmentation based with multilayer perceptron trained using back propagation and dynamic programming. GUMLP, similar to MULP system, two important differences which GUMLP are 1) without neural network based char compatibility, and 2) using heuristic algorithm from (Blumenstein and Verma, 1999a, 1999b). MURBF using neural network radial based function. From investigation optimum solution gets from 1000 random distribution from CEDAR database. With proposed borda count algorithm, recognition rate reported 91%. Verma et al., (2001) proposed a fusion of segmentation based recognition techniques which achieves 91% recognition rate on a large lexicon. Blumenstein and Verma (2001) generated possible segmentation points (PSP) using heuristic feature-based segmenter proposed by Blumenstein and Verma, (1999a, 1999b). Incorrect PSP removed by calculating confidence segmentation values using three ANN. First ANN provides correct and incorrect confidence segmentation value. Second ANN computes left character confidence (LCC) and lastly ANN put forward center character confidence (CCC) value. High confidence value means good candidate for segmentation. Confidence values from three ANN calculated as below. f (confidence) = max(f(CSP), f(ISP)) Where CSP is the correct segment point and ISP is the incorrect segment point. Two main experiments were conducted using CEDAR BD/Cities benchmark database. The first experiment was based on SPV technique of Blumenstein and Verma (1999a, 1999b) while second based on fusion of neural confidences. Comparison of segmentation errors for experiments is shown in figure 3
62
Figure 3
Comparison of Segmentation errors for experiment 1 & 2 (reprinted from of Blumenstein and Verma (2001))
Verma (2002) carried out segmentation step, starting with baseline detection. The word image was over-segmented, extracted left and right character, and finally evaluated and joint confidence value to validate segment points by ANN. Baseline found by conventional method of significant change in the horizontal histogram density. PSP found by using heuristic algorithm as per Blumenstein and Verma (2001) based on structure features of the word image. Verma, (2003) proposed rule-based segmentation of handwritten words. Following heuristic segmentation, a set of rules was proposed to check the validity of the existing segmentation
63
points and to cover miss-segmentation. Five reference lines were detected that made the entire process computationally expensive. Finally rules for removing and inserting segment lines based on weak assumption; however neural network was trained for those assumptions but ANN needed a lot of training. 81.08 % segmentation accuracy on CEDAR database was reported. Segmentation problem that include part of another character, solved by finding connected black boundary by first and second segmentation points, illustrate in figure 4. To evaluate confidence value, three ANN were employed by Verma and Gader (2000). Likewise, 300 words from directory BD/Cities of CEDAR used for experiments. Experiment compared with another similar algorithm proposed by Verma (2002). Correct and rejected incorrect segmentation results of algorithm proposed by Verma (2002) and via novel algorithm found are shown in table 1
Figure 4
Character extraction Verma (2002)
64
Table 1
Segmentation Results (reprinted from Verma, 2002)
Before the conclusion of this section, it would be advantageous to outline a number of the more recent studies employing some form of dissection or pre-segmentation. Ghosh et al., (2004) proposed direct segmentation approach in their fully automated offline handwriting recognition system. The segmentation is done using many heuristic based set of rules in an iterative manner and finally followed by a neural network validation system. Segmentation results (with ANN validation) for over-segmentation, missed and bad segmentation reported 10.8 %, .2 % and 5.4%. Cheng et al., (2004) enhanced script segmentation based on neural confidence by employing featurebased heuristic segmenter (Blumenstein and Verma, 1999a, b). For segment point validation (SPV), validation of left contour (LC) and center contour (CC) were introduced as proposed by Blumenstein and Verma (2001). Finally to enhance whole segmentation process modified direction features (MDF) were added (Blumenstein et al., 2004). MDF, firstly, calculate changes from background to foreground pixel vertically and horizontally using two values and secondly, Left and destination transition (LT, DT). Overlapped and closest characters solved using segmentation path direction (SPD) a character extraction method. SPD used base-fit line to determine a foreground pixel that became starting point. (Starting point must be on top of base-fit line, see figure 5)
65
Figure 5
Word sample sections and segmentation path generation (reprinted from Cheng, 2004)
Extraction path find if from starting point exploring to right side then can reach upper row of images or if right path failed, exploring to left side from starting point. Data use in experiments taken from CEDAR benchmark database. Segmentation result using SPD on 317 word reported that correct extraction percentage is 95.27%. Cheng and Blumenstein (2005a) improved their own previous work (Cheng and Blumenstein, 2004, 2005b) by proposing enhanced heuristic segmenter (EHS) with neural-based segmentation and possible segmentation point (PSP) validation to improve segmentation of cursive handwriting. Validation of segmentation point used segment point validation (SPV), left character validation (LCV) and center character validation (CCV) as per Blumenstein and Verma (2001) approach. To get PSP more clearly, EHS used ligature detection and neural assistant. Ligature detection was determined from minimum value of modified vertical histogram at baseline. Baseline was established by calculating average of maxima and minima points, firstly abnormal maxima and minima were removed. Neural assistant added segmentation points based on confidence value using MDF extraction and distance of two PSP. CEDAR benchmark database was used for training and experimental steps. Results for EHS with
66
neural assistance were reported 7.37%, 0.1%, and 6.79% for over, missed and bad segmentation respectively. Cheng and Blumenstein (2005b) developed enhanced heuristic segmenter (EHS) that improved feature-based heuristic segmentation (FHS) algorithm of Verma (2002) by making use of feature extraction technique (MDF) for segmentation area (SA), left character (LC) and center character (CC) representation. EHS examined using CEDAR database with 317 words. Segmentation error rates are shown in table 2.
Table 2
Segmentation Error Rates
Samrajya et al., (2006) proposed a Hypergraph model to segment cursive handwritten words. Hypergraph model treats an image as packets of pixels. Authors claimed that by recombining these packets of different sizes a given image can be segmented with the guarantee that at least one of the combinations provides a correct segmentation. No segmentation results were put forward for comparison, however the technique does not seems to yield successful results for horizontal overlapped and touching characters.
67
Bishnu and Chaudhuri (2007) proposed explicit segmentation approach for Bangla handwritten text into Characters by recursive contour following. The technique based on the structure features of script. Dawoud (2007) introduced iterative cross section sequence graph (ICSSG) for the character segmentation. ICSSG tracks the characters growth at equally spaced thresholds. The iterative thresholding reduces the effect of information loss associated with image binarization. However the experiments were performed on handwritten digits only.
RECOGNITION-BASED SEGMENTATION An alternative aimed at avoiding the prior segmentation of the string has been the use of implicit segmentation-based methods to integrate segmentation and recognition processes. A promising approach to achieve this has been based on Hidden Markov Models (HMM). This approach was originally developed for the field of speech recognition (Rabiner, 1989), where it has been applied with much success. The benefits of applying such a technique to recognize printed words have been shown in (Bose and Kuo, 1989). Procter et al., (1998), proposed method in which Elms et al., (1998) was adapted for handwritten numeral strings. Other works, like Zimmermann and Bunke (2002), have also shown that approach as a good method for recognizing handwritten words. From these studies, we may conclude that such an approach is a promising way of integrating segmentation and recognition to deal with the difficulties encountered in processing both handwritten numeral strings and handwritten words. However, Britto et al., (2001a, 2001b) have observed some cost attached to that integration, loss in recognition performance is caused by combining segmentation with recognition. So, a segmentation/verification strategy shows to be suitable to compensate the loss in terms of recognition caused by the implicit-
68
segmentation strategy. Bretto et al., (2002) present a segmentation algorithm using hypergraph theory. They used hypergraph theory for developing image-processing applications such as segmentation. However the segmentation they have dealt with is for picture images and not for cursive handwritten words. Kavallieratou et al., (2000b) proposed a simplified variation in transformation based learning (TBL) method that extracts rules automatically to detect segment boundaries. It includes two stages, a pre-segmentation stage process provides a first estimation of the segment boundaries followed by a machine learning algorithm that refines the presegmentation and provides the final segment boundaries. An accuracy results up to 82 % are reported. The previous section dealt with techniques that used some sort of Pre-segmentation followed by contextual processing to finalize a segmentation decision. Conversely, the first two techniques described here initially generate a set of hypotheses by using windows to scan the image and then choose the best hypothesis. Burges et al., (1992) describe a technique called shortest path segmentation. It utilizes a combination of dynamic programming and neural classification. A number of equally spaced windows are involved in determining all possible "legal" cuts via their combination. A graph is created that contains nodes representing all acceptable window segments. Nodes are connected when they correspond to legal neighbors. Each node in the graph is assigned a distance (given by the neural network). The shortest traversal through the graph represents the best segmentation/recognition of the word. Fukushima and Imagawa, (1993) used a neural model to develop a "selective attention" technique. The technique utilized a search area with a size larger than one character. It traversed the word image looking for features representing recognizable patterns. Such patterns would activate particular cells" in the neural network. Once such a pattern was found, the network automatically segmented and recognized a character in the area. Following initial recognition, focus was shifted from the recognized pattern, and the attention was switched to recognize another character in a neighboring region.
69
In the recognition-based section many other techniques exist which do not segment the image directly but segment a feature representation of the image instead. This category pertains to techniques that employ Hidden Markov-based schemes (Gilloux, 1993; Gilloux et al, 1993; Chen and Kundu, 1993). Chen et al., (1994, 1995) proposed an external segmentation algorithm based on singularities (island) and regularities (bridge). Morphological features (opening and closing) were employed to extract island and bridges. However, the choice of the structuring element was closely related to a width and height of the strokes composing the word analyzed. Moreover, there were a vast number possible of structuring elements from which to choose. For their approach, the overall recognition rate was 89.4%. Non-Markov techniques exist such as that proposed by (Hayes, 1980) who applied hierarchical relaxation, to build a hierarchical description of a word that was read. It first translated the word image into graph representations that modeled stroke and letter structure of a word. The graphs stored information about all possible segmentations of the word. Relaxation procedures were performed that contextually reduced the number of possible segmentations. Another non-Markov approach was proposed by Simon (1992) and used the concept of singularities and regularities. A stroke graph representation was constructed from each thinned word image. Simon (1992) named two properties that were inherent in each word. Components that conveyed most of the information about the cursive words was termed "singular parts" (may be described as the path that joins all characters in cursive script). These were obtained by removing the "regular parts" of the word. A description chain was derived from the singular parts to represent such information as robust features and characters in the word. The remaining parts were analyzed using dynamic matching. As with each category so far, only a subset of recognition-based techniques was outlined above. Segmentation based approaches try to segment a given word into smaller entities. However, as it is extremely difficult, if not impossible, to segment a given word into its individual characters without knowing the words identity, they usually split a
70
word into entities that dont necessarily correspond to exactly one character each, and they consider a number of possible segmentation alternatives at the same time. Typically, an oversegmentation of the given input word is attempted. That is, the image of a character that occurs within a word may be broken into several constituents, also called graphemes. At the same time the segmentation procedure avoids merging two adjacent characters, or parts of two adjacent characters, into the same constituent. A large number of heuristics for achieving such kind of segmentation have been reported in the literature. (Casey and Lecolinet, 1996; Bunke and Wang, 1997).Once the given input word has been transformed, through segmentation, into a sequences of graphemes, (g1, g2, . . . , gn), all possible combinations of adjacent graphemes, up to a maximum number M, are considered and fed into a recognizer for isolated characters. Lu and Shridhar (1996) also use lexicon to direct both word segmentation and recognition. Likewise, Sin and Kim (1997) approach models all handwritten Latin words or composite characters with a finite state network (FSN) using a set of Hidden Markov models. Each HMM is associated to either a letter or a ligature pattern, while the FSN stands for a word or a character model which is designed as a network of letter and ligature HMMs, according to both the writing order and the spatial structure. Recognition is performed by applying either graph searching or dynamic programming algorithms, so that segmentation is achieved simultaneously with recognition. However, since it needs to learn HMMs for words, the proposed solution is language and lexicon dependent. Similarly, Tay et al., (2003) fused HMMs and ANN for off-line cursive handwriting segmentation/recognition. Segmentation graph was generated that describes all ways to segment a word into letters. To recognize a word, ANN computes the observation probabilities for best path of each segmentation candidate in the segmentation graph. Finally, the best path probabilities are multiplied to compute likelihood for each word in the lexicon using fused letter HMMs. Typically it is supposed that the recognizer not only returns an ordered list of class names, but also renders a confidence for
71
each class. Once all possible combinations of graphemes have been classified, a search procedure is applied that finds, based on the confidence values returned by the classifier, the best sequence of characters matching the input word image. Typically, dynamic programming or some A*-type search algorithm proposed by Ha et al., (1998) is used. The search procedure is often run under the control of a dictionary of legal words. As an alternative, the dictionary may be used in a post-processing phase. Many instances of this generic procedure have been reported in the literature (Favata, 2001; Liu and Gader, 2002). An advantage of segmentation based word recognition schemes as discussed above is that the problem is reduced to isolated character recognition - a problem for which a number of quite mature algorithms have become available. On the other hand, segmentation and grapheme recombination are both based on heuristic rules that are derived by human intuition. Cavalin et al., (2006) proposed two-stage HMM based method, an implicit segmentation is applied to segment either words or numeral strings, and in the verification stage, foreground and background features are combined to compensate the loss in terms of recognition rate when segmentation and recognition are performed in the same process. For the lexicon of size 3771, 88.2% recognition rate reported after two stage process. Hamamura et al., (2007) proposed an analytic word recognition algorithm using posteriori probability ratio. 9.1 % improvement in recognition results was claimed. The development of automatic procedures that are able to learn segmentation rules from training data and automatically infer the parameters that guide the search for fitting the optimal character hypotheses is still an open problem. In summary, the challenge is to find some way to compensate for the loss in recognition performance resulting from the necessary tradeoff between segmentation and recognition carried out in an implicit segmentation-based method (Bortolozzi et al, 2005). Implicit methods used argument that in case of cursive script, segmentation can not be achieved without recognition, because without understanding the character included in the word
72
there are no good criteria to avoid segmentation errors. Nevertheless, there is evidence that basing segmentation upon recognition has some drawbacks, such as, in case of words with broken, touched, illegible or missed characters can not be recognized. This paradox has been discussed by Sayre (1973).
Hybrid Strategies In this section, the strategies described are a mix of the two previously discussed (dissection-based and recognition-based). This category also uses pre-segmentation, however differs from the grapheme approach in that the requirements of the definition of a primitive are not as strict. Therefore dissection is employed to word images, and the objective is that the word should be oversegmented a sufficient number of times to ensure that all appropriate letter boundaries have been cut. To evaluate the best segmentations a set of hypotheses are tested by merging segments of the image and invoking a classifier to score the combinations. Most techniques employ an optimization algorithm making use of some sort of dynamic programming technique and possibly incorporating contextual knowledge. The basic approach described above was proposed simultaneously by a number of researchers (Casey, 1992; Kimura et al., 1992; Bruel, 1994). Some researchers also incorporated lexical information as part of the process (Favata and Srihari, 1992; Sinha et al., 1993).Despite a certain level of maturity has been reached, there is still an urgent need to further improve the available segmentation and recognition technology (Gatos et al., 2007). This concludes the section on cursive word segmentation. Many systems were briefly outlined in all categories as stipulated by Casey and Lecolinet (1996). In fact, some of the most successful systems have been segmentation based, which has many problems. The detailed analysis (Blumenstein and Verma, 2001;
73
Verma et al., 2004; Chen and Leedham, 2005) has shown that most existing segmentation algorithms have three major problems: (1) inaccurately cutting characters into parts; (2) missing many segmentation points; and (3) over-segmenting a character many times, which contributes to errors in the word recognition process. Most researchers have evaluated their segmentation accuracy as an overall word recognition performance. Additionally, database and experimental setup is different among the researchers. Hence it is difficult, if not impossible, to compare their results. However, some of the top results for segmenting cursive words are outlined in table 3 for fair comparison. The list is by no means exhaustive; however the objective was to provide a representative spread of the techniques that have been developed for the past two decades.
74
Table 3
Comparisons of Segmentations Results
75
FEATURE EXTRACTION A critical step in cursive handwriting recognition process is the extraction and selection of feature set for classification of segmented characters and the whole words (holistic approaches). Since discriminative feature set is considered the most important factor in achieving high accurate recognition performance. Based on the discriminative feature set, classifier classifies the character/word. In this regard, Trier, (1996) presented an interesting survey of feature extraction methods for off-line recognition of isolated characters. Likewise review of the shape analysis techniques can be found in (Loncaric, 1998; Zhang and Lu, 2004, Jiang and Zhang, 2004; Wang et al., 2006). In last few decades, considerable numbers of feature extraction and selection techniques are proposed for character segmentation, character recognition and word recognition. Indeed, feature extraction for numerical recognition has almost been mature and competent results are reported in the literature (Suen et al., 1993; Gader et al, 1996a; Cho, 1997; Ha and Bunke, 1997; Suen, 1999; Cai, 1999; Liu, 1999; Dong, 2001;Britto et al., 2001b; Teow, 2002; Liu, 2002; Belongie, 2002; DeCoste and Schlkopf, 2002; Xu et al., 2003; Oliveira, 2004; Britto et al.,2004; Cavalin, 2006; Kapp et al., 2007; Wen et al., 2007). However, it is clear from the recent studies that the same accuracy measure could not be achieved for cursive character recognition. It is mainly due to the problems inherited in the cursive characters such as broken characters, overlapped/ merged/ touching characters and ambiguous characters due to segmentation errors in cursive handwriting as well writer style, mood and speed. Therefore, research communities have handled these problems by investigating a variety of new and hybrid features for the classification/ recognition of cursive characters (Wen et al., 2007; Kapp et al., 2007; Cheriet et al., 2007; Dawoud, A. 2007; Blumenstein et al., 2007; Gatos et al, 2006a; Wang, 2006; Wang et al., 2005; Britto et al., 2004; Blumenstein et al., 2004; Verma et al.,
76
2004; Gnter and Bunke, 2004; Blumenstein et al., 2003; Vinciarelli et al.,2003; Camastra and Vinciarelli, 2003; Hanmandlu et al., 2003; Xu et al., 2003; Verma, 2003; Arica and YarmanVural, 2002; Gang et al., 2002; Fan and Verma, 2002; Blumenstein and Verma, 2001; Verma et al., 2001; Xiao and Leedham, 2000; Plamondon and Srihari, 2000; Blumenstein and Verma, 1999; Singh and Amin, 1999; Verma et al., 1998; Chiang, 1998; Yanikoglu and Sandon, 1998; Dimauro et al., 1998; Cho, 1997; Eastwood et al., 1997; Gader et al., 1997; Casey and Lecolinet, 1996; Gilloux, 1993; Srihari, 1993; Suen et al., 1993; Lu and Shridhar, 1996; Lu, 1995; Martin et al. 1993; Dunn and Wang, 1992; Fujisawa et al., 1992; Elliman and Lancaster, 1990). Despite the continuous efforts of the last two decades, still the importance of a single feature in recognizing a character has not been fully explored (Blumenstein et al., 2007). Because, there might be only one or two values, which are significant to recognize a particular segmented character/primitive. Hence, selection of discriminative feature set for character recognition is still debatable. Because, too much features creates ambiguity for the classifier and leads to a worse rather than better performance, moreover, make classifier slow. Whereas, if few features are extracted in order to speed up the process, insufficient information may be passed to classifier and therefore error rate is enhanced. In a situation like this, a feature could have one of the following effects on the classifier: (i) (ii) (iii) Contributes positively to the classification process. Contributes negatively. Does not do anything - sits idle.
If a feature contributes positively, it enhances the accuracy of classification. A feature with a negative impact, adversely affects the accuracy while a feature which sits idle neither enhances nor deteriorates the accuracy of the classification process. However, it is always a good practice to remove both the idle sitting and negatively impacting features from our feature set. Pudil et al.,
77
(1994) have proposed a sequential floating feature selection scheme. This scheme has two approaches: (i) (ii) Starting from a single feature, keep adding features till the optimal set is obtained, and Starting from the whole set of features; go on removing the features till the optimal set is obtained.
To conclude, features selection for pattern recognition problem is associated with several factors such as accuracy, required learning time, and the necessary number of samples. Recently researchers have applied feature selection techniques to reduce the complexity of classifiers and enhance performance as well (Kim, 2000; Oliveira, 2003a; Nunes, 2004). In general, the feature extraction techniques are widely divided into two types of feature: statistical and structural (Bortolozzi et al., 2005). The statistical features are derived from statistical distributions of points such as grid, moments, projection profile (horizontal/vertical). The structure features are obtained from geometrical and topological properties of the character such as strokes and their directions, end points, loops, ascenders and descenders, overall shape of the word. Few researchers also integrated these two types of complementary features to improve character/ word recognition rates (Cai and Liu, 1999; Heute, 1998; Britto et al., 2004; Nadia and Najoua, 2006; Gatos et al., 2006a).
HANDWRITING RECOGNITION Handwriting recognition techniques has been developed and investigated for the classification of characters, words and features. The classification techniques are divided into two main categories: statistical and intelligent classifiers. The statistical classifiers make
78
decision based on statistical decision function. An optimal criterion is derived to determine the probability of the observed pattern belonging to a certain class. Many successful recognition techniques are based on this strategy such as template matching, Bayesian classifier, Polynomial discriminate classifier, Fuzzy logic/rules, k-NearestNeighbor (K-NN). However, some statistical methods require all training samples be stored and compared for the classification process (Liu & Fujisawa, 2005). That has been found to be impractical in real-world applications. Recently, neural network classifiers are proved to be powerful and successful for character/ word recognition (Cho, 1997; Verma et al., 2004; Blumenstein et al., 2007). However, to improve the intelligence of these ANNs, huge iterations, complex computations, and learning algorithms are needed, which also lead to consume the processor time. Therefore, if the recognition accuracy is improved, the consumed learning time will increase and vice versa. Which is the main drawback of ANN based approaches. HMM-based classifiers remained highly successful for numeric recognition and recognition rates above 98% for off-line handwritten, isolated numerals are reported in the literature (Cavalin et al., 2006; Britto et al., 2004; Arica and Yarman-Vural, 2002; Cai and Liu, 1999). Likewise, for the global word recognition problem, HMMs based techniques are growing successfully (Gnter and Bunke, 2005; Schambach, 2005; ViardGaudin et al., 2005; Grandidier, 2003; Kundu, 2002; Senior, 2002; Plamondon and Srihari, 2000; El-Yacoubi, 1999). On the other hand, for analytical approaches, neural network classification has been commonly used in conjunction with dynamic programming (Gader et al., 1997). Recently, few researchers have employed support vector machines for numeral/character classification successfully and promising results above 99% are reported (Liu and Fujisawa, 2005). Moreover, support vector machines also have been used successfully for classification of words in recent studies (Gatos et al., 2006b).
79
Handwriting Recognition with Fused Classifiers Multiple classifiers combination for handwriting recognition is a new trend growing significantly. It has also been found that the use of multistage and combined classifiers has been more reliable than the classification decision of the best individual classifier for numeral/character classification (Camastra and Vinciarelli, 2003; Cao et al., 1995). This idea is presented in the literature under various title such a: classifier fusion (Gader, 1996a), classifier combination (Kittler, 1998; Alkoot and Kittler, 1999; Xu, 1992), mixture of experts (Jacobs, 1991), committees (Bishop, 1995). Moreover, researchers support different combination strategies in terms of accuracy and reliability. Some interesting results and comparison can be found in (Suen, 1992, Xu, 1992; Alkoot, 1999; Kittler, 1998; Tax, 2000; Verma, 2001; Kuncheva, 2002; Breukelen et al., 2000; Chellapilla et al., 2006). More recently, ensemble of classifiers in the fields of machine learning and pattern recognition is introduced. The main theme behind ensemble of classifiers is that several classifiers are emerged from one classifier. It is generated by adjustment of different parameters such as, changing the training set (Breiman, 1996), the input features (Ho, 1998; Optiz, 1999), the input data by injecting randomness (Valentini, 2002) or still the parameters and the architecture of the base classifiers (Partridge, 1996). Few researchers have implemented ensemble of classifiers to improve both accuracy and reliability of the handwriting recognition systems (Nishimura, 1999; Gunter and Bunke, 2003; Oliveira, 2003b; Morita, 2004; Liu, 2004) Few researchers have followed the strategy of verification on the bargaining of speed. Such kind of scheme has been successfully applied to handwriting recognition in (Takahashi, 1993; Zhou, 2000; Britto et al., 2002; Oliveira, 2003c). Beside that, two stage classification system is also implemented by few researchers. At first stage simple and fast classifier worked out while on the second stage, more complex and trained classifier is
80
applied to classify the rejected pattern of first stage (Vuurpijl, 2003; Milgram, 2004; Nunes, 2004). On the other hand, intelligent recognition systems are required to train with huge amount of training data that is not feasible in realistic case. The collection of larger database has become an issue in the community of handwriting recognition. Additionally, it is time consuming and expensive to collect all possible samples and prepare its ground truth. Therefore, to avoid huge collection of data, a new trend of training with synthetic data is invoked. Ha and Bunke, (1997) and Mori, (2000) made use of synthetic generation of isolated characters. The synthetic generation of handwritten words and sentences has been described in (Guyon, 1996; Helmers2003). In an upper level, Varga and Bunke (2003) proposed geometrical distortion model for complete lines of handwritten text. They have demonstrated through several experiments using HMM-based classifiers that the use of synthetic data can improve the recognition performance, particularly when the training data is small. Summary of recognition performances of recent off-line handwritten word recognition systems in chronological order year wise are shown in table 4.
81
Table 4
Summary of Recognition Performances
82
83
RR: Recognition Rate, Unc: Unconstrained, WD: Writer dependent, Cur: Cursive, Omni: Omni writer, MONO: single writer, K-NN: K-Nearest-Neighbor, LOB: Lancaster OSLO/Bergen, USPS: United State Postal Serves.
SUMMARY AND REMAINING PROBLEMS WITH SUGGESTIONS In this chapter, a state of the art in offline cursive handwriting recognition and its associated components are presented with the great emphasis on segmentation-based handwriting recognition technique. A critical literature review of existing techniques and comparative study of recent achievements in the area has also been presented. Novel strategies by the authors to tackle existing problems in preprocessing, segmentation-based handwriting recognition have also been presented. The ultimate target of handwriting recognition is to have machines which can read any text with the same recognition accuracy as humans but at a faster rate (Lorette, 1999). Indeed, the research in this domain has shown
84
significant improvement in this direction, however, future research is required to focus on the following shortcomings. Indeed, research is matured in area of numeral recognition however the same accuracy level is not met with alphabetic. The problem of cursive character recognition remains very much an open problem. It is mainly due to noisy, broken, multi-stroke, incomplete and ambiguous characters. To handle this type of problem new feature extraction / selection techniques and multistage classifiers are desired. Furthermore, salient features have still not been determined to adequately distinguish difficult/ ambiguous, segmented/cursive characters. Regarding word recognition, the problem is seemed to be solved in small and static lexicons using holistic strategy. However, recognition accuracy dropped significantly for larger lexicon. Therefore, segmentation based word recognition is an alternative solution. On the other hand, segmentation algorithms have three major problems: (1) inaccurately cutting characters into parts; (2) missing many segmentation points; and (3) over-segmenting a character many times, which contributes to errors in the word recognition process, (4) negative effects on speed are also observed. Still, algorithms to tackle the variety of writing styles as well as appropriate features to describe the suitable segmentation points of interest and for subsequently determining correct/incorrect segmentations are lacking. Therefore, it is required to propose and develop new strategies to improve segmentation accuracy and overall accuracies for off-line cursive handwriting recognition.
85
REFERENCES
ALKOOT, M., AND KITTLER, J. (1999). Experimental evaluation of expert fusion strategies. Pattern Recognition Letters, 20(1113):1361-1369. ARICA, N., YARMAN-VURAL, F., T. (2002). Optical character recognition for cursive handwriting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(6):801-813, 2002. BELONGIE, S., MALIK, J., PUZICHA, J. (2002). Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Analysis and Machine Intelligence, 24 (2), 509-522. BIPPUS, R., AND MARGNER, V. (1999). Script recognition using inhomogeneous p2dhmm and hierarchical search space reduction. In Proc. 5th International Conference on Document Analysis and Recognition.773-776, Bangalore, India BISHNU, A., AND CHAUDHURI, B., B. (2007). Segmentation of Bangla Handwritten Text into Characters by Recursive Contour Following. Proceedings on Computing Theory and Applications. (ICCTA'07). BLUMENSTEIN, M., AND VERMA, B. (1997). A Segmentation Algorithm used in Conjunction with Artificial Neural Networks for the Recognition of Real-World Postal Addresses. Proceedings of 2nd Online World Conference on Soft Computing in Engineering Design and Manufacturing. BLUMENSTEIN, M., AND VERMA, B. (1998a). An Artificial Neural Network Based Segmentation Algorithm for Off-line Handwriting Recognition. Proceedings of the Second International Conference on Computational Intelligence and Multimedia Applications, Gippsland, Australia, 306- 311. BLUMENSTEIN, M., AND VERMA, B. (1998b). A Neural Based Segmentation and Recognition Technique for Handwritten
86
Words, Proceedings of the World Congress on Computational Intelligence, Anchorage, Alaska, 1738-1742. BLUMENSTEIN, M., AND VERMA, B. (1998c).Conventional vs. Neuro-Conventional Segmentation Techniques for Handwriting Recognition: A Comparison, Proceedings of the Second IEEE International Conference on Intelligent Processing Systems, Gold Coast, Australia, 473-477. BLUMENSTEIN, M., AND VERMA, B., (1999a). Neural Solutions for the Segmentation and Recognition of Difficult Words from a Benchmark Database. Proceedings of the 5th International Conference on Document Analysis and Recognition, (ICDAR 99), Bangalore, India, 281-284. BLUMENSTEIN, M., AND VERMA, B. (1999b). A New Segmentation Algorithm for Handwritten Word Recognition. Proceedings of the International Joint Conference on Neural Networks, Washington., D.C., Vol 4, 878-882. BLUMENSTEIN, M., AND VERMA, B. (2001). Analysis of Segmentation Performance on the CEDAR Benchmark Database. Proceedings of 6th International Conference on Document Analysis and Recognition. 1142-1146. BLUMENSTEIN, M., CHENG, C., K., AND LIU, X., Y., (2002). New Preprocessing Techniques for Handwritten Word Recognition. Proceedings of 2nd International Conference on Visualization, Imaging and Image Processing, ACTA. Press, Calgary, 480484. BLUMENSTEIN, M., VERMA, B., AND BASLI, H. (2003). A novel feature extraction technique for the recognition of segmented handwritten characters. In M. Fairhurst, and A. Downton (Eds.), Proceedings of the Seventh International Conference on Document Analysis and Recognition, 137141. BLUMENSTEIN, M., LIU, X. Y., AND VERMA, B. (2004). A Modified Direction Feature for Cursive Character Recognition, Proceedings of the International Joint Conference on Neural Networks, Budapest, Hungary, 2983-2987.
87
BLUMENSTEIN, M., LIU, X., Y., AND VERMA, B.(2007). An Investigation of the Modified Direction Feature for Cursive Character Recognition. Pattern Recognition, Vol. 40, 376-388. BORTOLOZZI, F., SOUZA, A., BRITTO JR., LUIZ S. OLIVEIRA AND MORITA, M. (2005) Recent Advances in Handwriting Recognition. Document Analysis, Editors: Umapada Pal, Swapan K. Parui, Bidyut B. Chaudhuri, Pages 1-30. BOSE, C. B., KUO, S. (1994). Connected and Degraded Text Recognition using Hidden Markov Model. Pattern Recognition, 27(10), 1345-1363. BOZINOVIC, R., M., AND SRIHARI, S., N. (1989). Off-line Cursive Script Word Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. BRETTO, A., AZEMA, J., CHERIFI, H., AND LAGET, B. (1997). Combinatorics and Image Processing, Graphical Models and Image Processing vol. 59, 256-277. BRETTO, A., CHERIFI, H., AND ABOUTAJDINE, D. (2002) Hypergraph imaging: an Overview, Pattern Recognition vol. 35, 651-658. BREUKELEN, T., M., DUIN, R., KITTLER, J. (2000). Combining multiple classifiers by averaging or by multiplying? Pattern Recognition, 33(9):1475-1485. BRUEL, T. (1994). Design and Implementation of a System for Recognition of Handwritten Responses on US Census Forms, Proceedings of the IAPR Workshop on Document Analysis Systems, Kaiserlautern, Germany, 237-264. BROWN, M., K., GANAPATHY, S. (1983). Preprocessing Techniques for Cursive Script Word Recognition. Pattern Recognition, 16(5), 447-458. BREIMAN, L. (1996). Bagging predictors. Machine Learning, 24(2):123-140, 1996. BRITTO, JR. A., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y.(2001a): An Enhanced HMM Topology in an LBA Framework for the Recognition of Handwritten Numeral Strings, Proceedings of the International Conference on
88
Advances in Pattern Recognition, Vol 1, 105-114, Rio de Janeiro-Brazil. BRITTO JR. A, SABOURIN, R., BORTOLOZZI, F., SUEN, C-Y. (2002). A string length predictor to control the level building of HMMs for handwritten numeral recognition. Proceedings of 16th International Conference on Pattern Recognition, Vol. 4, 31-34. BRITTO JR., A. SABOURIN, R., BORTOLOZZI, F., AND SUEN, C.Y. (2004). Foreground and background information in an HMMbased method for recognition of isolated characters and numeral strings. Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition, 371376. BRETTO, A., CHERIFI, H., AND ABOUTAJDINE, D. (2002). Hypergraph imaging: an Overview. Pattern Recognition, vol.35, 651-658, 2002. BUNKE, H., ROTH, M AND SCHUKAT-TALAMAZZINI, E-G. (1995). Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognition, 28(9):1399-1413. BUNKE, H., AND WANG, P., Editors. Handbook of Character Recognition and Document Image Analysis. World Scientific, 1997. Pages 123156. BURGES, C., J., C., BE, J., I., AND NOHL, C., R. (1992). Recognition of Handwritten Cursive Postal Words using Neural Networks. Proceedings of the 5th United States Postal Service (USPS) Advanced Technology Conference, 117-124. BURGES, C.J.C., DENKER, J.S., LECUN. Y., AND NOHI. C.R., (1993) Off-line recognition of handwritten postal words using neural networks. International Journal of Pattern Recognition and Artificial Intelligence. 7(4) 689- 704. CAESAR, T., GLOGER, J., M., AND MANDLER, E. (1993). Preprocessing and Feature Extraction for a Handwriting Recognition System. Proceedings of International Conference on Document Analysis and Recognition. 408-411. CAI, J., AND LIU, Z.-Q. (1999). Integration of structural and statistical information for unconstrained handwritten numeral recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(3), 263270.
89
CAI, J., AND LIU, Z-Q. (2000). Off-line unconstrained handwritten word recognition. International Journal of Pattern Recognition and Artificial Intelligence, 14(3), 259-280. CAMASTRA, F., AND VINCIARELLI, A. (2003). Combining neural gas and learning vector quantization for cursive character recognition. Neuro-computing, 51, 147159. CAESAR, T., GLOGER, J.M. AND MANDLER, E. Preprocessing and features extraction for handwriting recognition system. International conference on document analysis and recognition, 408-411, 1993. CAO, J., AHMADI, M., AND SHRIDHAR, M. (1995). Recognition of handwritten numerals with multiple feature and multistage classifier. Pattern Recognition, 28(3), 153159. CASEY, R., G., (1992). Segmentation of Touching Characters in Postal Addresses, Proceedings of the 5th USPS Advanced Technology Conference, 743-754. CASEY, R., G., AND LECOLINET, E. (1996). A Survey of Methods and Strategies in Character Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, 18, 690-706. CAVALIN, P., R., BRITTO, A., S., BORTOLOZZI, F., SABOURIN, R., AND OLIVEIRA, L., S. (2006). An Implicit Segmentation based Method for Recognition of Handwritten Strings of Characters. Proceedings of ACM symposium on applied computing, 836840. CHARLES, R., GIARDINA, EDWARD, R., DOUGHERTY. (1988). Morphological Methods in Image and Signal Processing, Prentice Hall, Inc. CHEN, M-Y., KUNDU, A., ZHOU, J., AND SRIHARI, S., N. (1992). Off-Line Handwritten Word Recognition using Hidden Markov Model, Proceedings of the 5th USPS Advanced T. CHEN. M.Y, KUNDU AND A, ZOHU J. (1994) Off-line handwritten word recognition using a hidden Morkov model type stochastic network. IEEE Trans Patten Analysis and Machine Intelligence 16(5):481-491. CHELLAPILLA, K., SHILMAN, M., SIMARD, P. (2006) Combining Multiple Classifiers for Faster Optical Character Recognition.
90
Proceedings of International conference on Document Analysis Systems, Springer, LNCS 3872, 358367. CHO, W., LEE, S. W., KIM, J. H.(1995). Modeling and Recognition of Cursive Words with Hidden Markov Models. Pattern Recognition, 28(12), 1941-1953. CHO, S.-B. (1997). Neural-network classifiers for recognizing totally unconstrained handwritten numerals. IEEE Transactions on Neural Networks, 8(1), 4353. CHEN, M-Y., AND KUNDU, A. (1993). An Alternative Approach to Variable Duration HMM in Handwritten Word Recognition, Proceedings of the 3rd International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, 82-91. CHEN, M-Y., KUNDU, A., ZHOU, J.(1994). Off-Line Handwritten Word Recognition Using a HMM Type Stochastic Network, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5), 481-496. CHEN, M-Y., KUNDU, A., AND SRIHARI, S., N., (1995). Variable duration hidden Morkov model and morphological segmentation for handwritten word recognition. IEEE transactions on Image Processing, 4(12), 1675-1687. CHEN, Y. AND LEEDHAM, G. (2005). Independent Component Analysis Segmentation Algorithm. Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR05), Vol. 2, 680- 684. CHENG, C. K., LIU, X., Y., BLUMENSTEIN, M., AND MUTHUKKUMARASAMY, V. (2004). Enhancing Neural Confidence-Based Segmentation for Cursive Handwriting Recognition, 5th International Conference on Simulated Evolution and Learning, Busan, Korea, SWA-8. CHENG, C., K., AND BLUMENSTEIN, M. (2005a). The Neural Based Segmentation of Cursive Words Using Enhanced Heuristics. Proceedings of the 8th International conference on Document analysis and recognition. Vol 2, 650-654. CHENG, C., K., AND BLUMENSTEIN, M. (2005b). Improving the Segmentation of Cursive Handwritten Words using Ligature Detection and Neural Validation. Proceedings of the 4th Asia
91
Pacific International Symposium on Information Technology (APIS 2005), Gold Coast, Australia, 56-59. CHERIET, M. (1993). Reading Cursive Script by Parts. Proceedings of the 3rd International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, May 25-27, 403-408. CHERIET, M., KHARMA, N., LIU, C-LIN., SUEN, C-Y., (2007) Character Recognition Systems (OCR), Page 204-206(Wiley, 2007). CHEVALIER, S., GEOFFROIS, E., PRETEUX, F., AND LEMALTRE, M. (2005). A Generic 2D Approach of Handwriting Recognition. Proceedings of the 8th International Conference on Document Analysis and Recognition, 489493. CHIANG, J-H. (1998). A hybrid neural model in handwritten word recognition. Neural Networks, 11(2), 337346. COTE, M., LECOLINET, E., CHERIET, M. AND SUEN, C.Y. (1998). Automatic reading of cursive scripts using a reading model and perceptual conceptsthe PERCEPTO system. International Journal of Document Analysis and Recognition. 1(1), 317. DAWOUD, A. (2007). Iterative Cross Section Sequence Graph for Handwritten Character Segmentation. IEEE Transactions on Image Processing, 16(8). DECOSTE AND B. SCHLKOPF (2002). Training invariant support vector machines. Machine Learning Journal, 46(1-3):161-190. DIMAURO, D., IMPEDOVO, S., PIRLO, G., AND SALZO, A. (1998). An Advanced Segmentation Technique for Cursive Word Recognition. Advances in Handwriting Recognition, S. W. Lee (ed.), World Scientific Publishing,255-264.Technology Conference, 563-579. DONG, J-X, KRZYYZAK, A., AND SUEN, C-Y. (2001). A muti-net learning framework for pattern recognition. Proc. of the Sixth International Conference on Document Analysis and Recognition, Seattle, 328-332. DUNN, C., E., AND WANG, P., S., P., (1992). Character Segmenting Techniques for Handwritten Text - A Survey. Proceedings of 11th International Conference on Pattern Recognition, vol. 2, 577-591.
92
Dzuba, G., Filatov, A., Gershuny, D., and Kill. I., (1998) Handwritten word recognition, the approach proved by practice. In Proc. 6th International Workshop on Frontiers in Handwriting Recognition, 99-111, Taejon, Korea. DZUBA, G., FILATOV, A., GERSHUNY, D., AND KILL. I., (1998) Handwritten word recognition, the approach proved by practice. In Proc. 6th International Workshop on Frontiers in Handwriting Recognition, 99-111, Taejon, Korea. EARNEST, L., D. (1962). Machine Recognition of Cursive Writing, Information Processing. C. Cherry (ed.), London, Butterworth, 462-466. EASTWOOD, B., JENNINGS, A., AND HARVEY, A. (1997). Neural Network Based Segmentation Handwritten Words. Proceedings of 6th International Conference on Image Processing and its Applications, Vol. 2, 750-755. EHRICH, R., W., AND KOEHLER, K., J. (1975). Experiments in the Contextual Recognition of Cursive Script. IEEE Transactions on Computers, 24, 182-194. ELMS, A. J., PROCTER, S., ILLINGWORTH, J. (1989). The Advantage of using an HMM-based Approach for Faxed Word Recognition. International Journal on Document Analysis and Recognition, 18-36. ELLIMAN, D.G., AND LANCASTER, I.T. (1990). A review of segmentation and contextuel analysis techniques for text recognition. Pattern Recognition, 23(3-4), 337346. EL-YACOUBI, A., GILLOUX, M., SABOURIN, R., SUEN, C. Y. (1999). An HMM-based Approach for on-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):752-760. FAN, X., AND VERMA, B. (2002). Segmentation vs. nonsegmentation based neural techniques for cursive word recognition. International Journal of Computational Intelligence and Applications, 2(4), 18. FAROUZ, C., GILLOUX, M., AND BERTILLE. J.M., (1998). Handwritten word recognition with contextual Hidden
93
Morkov Models. In Proc. 6th International Workshop on Frontiers in Handwriting Recognition. 133-142. Taejon. Korea. FAVATA, J., T., AND SRIHARI, S., N. (1992). Recognition of General Handwritten Words Using Hypothesis Generation and Reduction Methodology, Proceedings of the 5th USPS Advanced Technology Conference, 237-251. FAVATA, J., T. (1997). Character Model Word Recognition, Progress in Handwriting Recognition, A. C. Downton and S. Impedovo (eds.), 57-61. FAVATA, J. T., (2001). Offline General Handwritten Word Recognition using an Approximate Beam Matching Algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23:393398. FOLEY J.D. DAM, A.V., FEINER, S.K., HUGHES, J.F. Computer Graphics: Principles and Practice in C, 2nd Edition, AddisonWesley, Pearson Education FREITAS, F. BORTOLOZZI AND R. SABOURIN. Handwritten isolated word recognition: An approach based on mutual information for feature set validation. In Proc. 6th International Conference on Document Analysis and Recognition, SeattleUSA, September, 665-669, 2001. FRISHKOPF, L., S., AND HARMON, L., D. (1961). Machine Reading of Cursive Script. Information Theory, C. Cherry (ed.), Butterworth, London, 300-316. 143 FUJISAWA, H., NAKANO, Y., AND KURINO, K. (1992). Segmentation methods for character recognition: From segmentation to document structure analysis. Proceedings of the IEEE, 80(7), 10791092. FUKUSHIMA, K., AND IMAGAWA, T. (1993). Recognition and Segmentation of Connected Characters with Selective Attention, Neural Networks, 6, 33-41. GADER P.D., MOHAMMED, M.A., AND CHIANG, J.H., (1994). Handwritten word recognition with character and inter character neural networks. IEEE Transaction systems. Man and Cybernetics- Part B, 27:158-164, 1994.
94
GADER, P.D., WHALEN, M., GANZBERGER, M. AND HEPP, D (1995). Hand printed word recognition on a NIST data set. Machine Vision and Applications. Vol. 8, 31-41. GADER, P.D., MOHAMED, M., AND CHIANG, J.-H. (1997). Handwritten word recognition with character and intercharacter neural networks. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 27(1), 158164. GADER, P. D., MOHAMED, M., A., KELLER, J.M. (1996b). Fusion of handwritten word classifiers. Pattern Recognition Letters, 17(6):577-584. GADER, P.D. AND KHABOU, M.A. (1996a). Automatic Feature Generation for Handwritten Digit Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18(12), 1256-1261. GANG, L., VERMA, B., AND KULKARNI, S. (2002). Experimental analysis of neural network based feature extractors for cursive handwriting recognition. Proceedings of the IEEE World Congress on Computational Intelligence, 28372841. GATOS, B., PRATIKAKIS, I., AND PERANTONIS, S., J. (2006a). Hybrid Off-Line Cursive Handwriting Word Recognition. Proceedings of 18th International Conference on Pattern Recognition (ICPR06), vol. 2, 998-1002. GATOS, B., PRATIKAKIS, I., KESIDIS, A.L., AND PERANTONIS, S.J. (2006b). Efficient off-line cursive handwriting word recognition. Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition. GATOS, B., ANTONACOPOULOS, A., STAMATOPOULOS, N. (2007). ICDAR 2007 Handwriting Segmentation Context. Proceedings of the International Conference on Document Analysis and Recognition, 1284-1288. GHOSH, M., GHOSH, R., AND VERMA, B. (2004). A Fully Automated Offline Handwriting Recognition System Incorporating Rule Based Neural Network Validated Segmentation and Hybrid Neural Network Classifier. International Journal of Pattern Recognition and Artificial Intelligence, 18(7), 1267-1283.
95
GILLIES, M.(1992). Cursive Word Recognition Using Hidden Markov Models. In Proc. Fifth U.S. Postal Service Advanced Technology Conference, 557-562. GILLOUX, M., BERTILLE, J., M., AND LEROUX, M. (1993). Recognition of Handwritten Words in a Limited Dynamic Vocabulary. Proceedings of the Third International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, May 25-27, 417-422. GILLOUX, M. (1993). Hidden Markov Models in Handwriting Recognition, Fundamentals in Handwriting Recognition, S. Impedovo (ed.), NATO ASI Series F: Computers and Systems Sciences, 124, Springer Verlag, New York, 264-288. GILLOUX M, LEROUX M, AND BERTILLE J-M. (1995a). Strategies for Cursive script recognition using Hidden Morkov Models. Machine vision and Application. 197-205. GILLOUX, M., LEMARIE, B., AND LEROUX, M. (1995b). A hybrid radial basis function / Hidden Morkov Model handwritten word recognition system. International conference on document analysis and recognition, Montreal, 394-397. GOVINDARAJU, V., AND SRIHARI, S., H. (1992). Separating Handwritten Text from Interfering Strokes, From Pixels to Features III- Frontiers in Handwriting Recognition, S. Impedovo, J.C. Simon (eds.), North-Holland Publication, 17-28. GRANDIDIER, F. (2003). Un Nouvel Algorithme de Slection de Caractristiques- Application la Lecture Automatique de l'ecriture Manuscrite. PhD thesis, cole de Technologie Suprieure, Montreal-Canada, Janvier. GUILLEVIC, D., AND SUEN, C., Y., (1993). Cursive Script Recognition: A Fast Reader Scheme. Proceedings of the 3rd International Conference on Documents Analysis and Recognition, 311-314. GUILLEVIC, D., AND SUEN, C., Y., (1994). Cursive Script Recognition: A Sentence Level Recognition Scheme. Proceedings of the 4th International Workshop on the Frontiers of Handwriting Recognition, 216-223.
96
GUILLEVIC, D., AND SUEN, C. (1998). HMM-KNN word recognition engine for bank check processing. In Proc. International Conference on Pattern Recognition. 1526-1529. Brisbane. Australia. GNTER, S., AND BUNKE, H (2003). Ensembles of classifiers for handwritten word recognition, International Journal on Document Analysis and Recognition, 5:224-232. GNTER, S., AND BUNKE, H. (2004). Feature selection algorithms for the generation of multiple classier systems and their application to handwritten word recognition. Pattern Recognition Letters, 25(11), 13231336. GNTER, S., AND BUNKE, H. (2005). Off-line cursive handwriting recognition using multiple classifier systems. On the influence of vocabulary, ensemble, and training set size. Optics and Lasers in Engineering, 43(3-5), 437454. GUYON (1996). Handwritten synthesis from handwritten glyphs. 5th International Workshop on Frontiers of Handwriting Recognition, 309-312. HA, T., BUNKE, H. (1997). Off-line handwritten numeral recognition by perturbation method. IEEE Trans. Pattern Analysis and Machine Intelligence, 19 (5), 535-539. HA, T., ZIMMERMANN, M., AND BUNKE, H (1998). Off-line Handwritten Numeral String Recognition by Combining Segmentation-based and Segmentation-free Methods. Pattern Recognition, 31(3), 257272. HAMAMURA, T., AKAGI, T., IRIE, B. (2007). An Analytic Word Recognition Algorithm Using a Posteriori Probability. Proceedings of International Conference on Document Analysis and Recognition. Vol. 02, 669-673. HAN, K., AND SETHI, I., K. (1995). Off-line Cursive Handwriting Segmentation, Proceedings of the 3rd International Conference on Documents Analysis and Recognition, 894-897. HANMANDLU, M., MURALI, K.R.M., CHAKRABORTY, S., GOYAL, S., AND CHOUDHURY, D.R. (2003). Unconstrained handwritten character recognition Based on fuzzy logic. Pattern Recognition, 36(3), 603623.
97
HAYES, K., C. (1980). Reading Handwritten Words Using Hierarchical Relaxation. Computers Graphics and Image Processing, Vol. 14, 344-364. HELMERS, M., BUNKE, H. (2003). Generation and use of the synthetic training data in cursive handwriting recognition. First Iberian Conf. on Pattern Recognition and Image Analysis, 336345. HEUTE L., T. PAQUET, J. V. MOREAU, Y. LECOURTIER, C. OLIVIER (1998). A structural/statistical feature-based vector for handwritten character recognition. Pattern Recognition Letters, 19, 629-641. HO, T.K. (1998). The random subspace method for constructing decision forests. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(8):832-844. HOLT, M., BEGLOU, M., AND DATTA, S. (1992). Slant-Independent Letter Segmentation for Off-line Cursive Script Recognition, From Pixels to Features III, S. Impedovo and J.C. Simon (eds.), Elsevier, page 41. HOWE, N.R., RATH, T.M., AND MANMATHA, R. (2005). Boosted decision trees for word recognition in handwritten document retrieval. Proceedings of the 28th Annual SIGIR Conference on Research and Development in Information Retrieval, 377383. INDIRA, K., SELVI, S. (2007). An Off line Cursive Script Recognition System using Fourier-Wavelet Features. International Conference on Computational Intelligence and Multimedia Applications, 506-511. JIANG, T. AND ZHANG, K.S. (2004). Efficient and robust feature extraction by maximum margin criterion. Proceedings of Advances in Neural Information Processing Systems, Vol. 16, 97104. KAVALLIERATOU, E., STAMATATOS, E., FAKOTAKIS, N., AND KOKKINAKIS, G. (2000b). Handwritten Character Segmentation Using Transformation-Based Learning. Proceedings of 15th International Conference on Pattern Recognition, Vol 2, 634637.
98
KAVALLIERATOU, E., DROMAZOU, N., FAKOTAKIS, N., AND KOKKINAKIS, G. (2003). An Integrated System for Handwritten Document Image Processing. International Journal on Pattern Recognition and Artificial Intelligence. 17(4), 617-636. KAPP, M.N., DE ALMENDRA FREITAS, C., AND SABOURIN, R. (2007). Methodology for the design of NN-based month-word recognizers written on Brazilian bank checks. Image Vision Computing, 25(1), 4049. KIM, G., (1996). Recognition of off-line Handwritten Words and its Extension to Phrase Recognition. Ph.D. thesis, State University of New York at Buffalo. KIM, G., AND GOVINDARAJU, V. (1997). A Lexicon Driven Approach to Handwritten Word Recognition for Real-Time Applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19 (4), 366-379. KIM, J., H., KIM, K., K., SUEN, C.Y. (2000). An HMM-MLP hybrid model for cursive script recognition. Pattern Analysis and Applications, 3:314-324. KIM, D. (2003). Slant Correction of Handwritten Strings Based on Structural Properties of Korean Characters. Pattern Recognition Letters, No. 12, 2093-2101. KIM, G., GOVINDARAJU, V., AND SRIHARI, S., N. (1999). Architecture for Handwritten Text Recognition Systems. Advances in Handwriting Recognition, 163-182. KIM, G., GOVINDARAJU, V., ECIENT (1996). Chain-code-based Image Manipulation for Handwritten Word Recognition. Proceedings of SPIE-The International Society for Optical Engineering, Bellingham, WA, USA, Vol. 2660, 262-272. KIM, G. AND GOVINDARAJU, V. (1997). A Lexicon Driven Approach to Handwritten Word Recognition for Real Time Application. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (4): 366379. KIM, K. K., KIM, J. H., SUEN, C. Y(2002).Segmentation-based Recognition of Handwritten Touching Pairs of Digits using Structural Features. Pattern Recognition, 23(1), 13-21.
99
KIMURA, F., TSURUOKA, S., SHRIDHAR, M., AND CHEN, Z. (1992). Context-Directed Handwritten Word Recognition for Postal Service Applications. Proceedings of the 5th USPS Advanced Technology Conference, 199-213. KIMURA, F., SHRIDHAR, M., AND NARASIMHAMURTHI, N. (1993).Lexicon-Directed Segmentation- Recognition Procedure for Unconstrained Handwritten Words. Proceedings of the Third International Workshop on Frontiers in Handwriting Recognition, Buffalo, New York, May 25-27, 122-131. KIMURA, F., SHRIDHAR, M., AND CHEN, Z. (1993). Improvements of a Lexicon Directed Algorithm for Recognition of Unconstrained Handwritten Words. Proceedings of the second International Conference on Document Analysis and Recognition, Tsukuba, Japan, 18-22. KNERR, S., ANISIMOV, V., BARET, O., GORSKI, N., PRICE D., AND SIMON, J.C. 1997.The A@IA Inter-check system. Courtesy amount and legal amount recognition for French Checks. Automatic bank cheque processing 43-86. KNERR, S., AUGUSTIN, E., BARET, O., PRICE, D. (1998). Hidden Markov Model Based Word Recognition and its Application to Legal Amount Reading on French Checks. Computer Vision and Image Understanding, 70 (3), 404-419. KOCH, PAQUET, T., HEUTTE, L. (2004). Combination of Contextual Information for Handwritten Word Recognition. 9th International Workshop on Frontiers in Handwriting Recognition, Kokubunji, Tokyo, Japan, 468-473. KOERICH, A., L., SABOURIN, R. AND SUEN, C., Y. (2003) Large Vocabulary Off-line Handwriting Recognition: A Survey. Pattern Analysis and Application, 6(2), 97-121. KOERICH, L., SABOURIN, R., SUEN, C-Y. (2004) Fast TwoLevel HMM Decoding Algorithm for Large Vocabulary Handwriting Recognition, 9th International Workshop on Frontiers in Handwriting Recognition, October 26-29, Kokubunji, Tokyo, Japan, 232-238. KOERICH, A.L., SABOURIN, R., AND SUEN, C.Y. (2005). Recognition and verification of unconstrained handwritten
100
words. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 15091522. KOERICH, A.L., BRITTO, A., OLIVEIRA, L.E.S., AND SABOURIN, R. (2006). Fusing high- and low-level features for handwritten word recognition. Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition. KITTLER, J., HATEF, M., DUIN, R., AND MATAS, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226-239. KUNDU, Y. HE, M. CHEN (2002). Alternatives to variable duration HMM in handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1275-1280. LALLICAN P.M. AND VIARD-GAUDIN C.(1998). Off-line handwriting Modeling as a Trajectory tracking Problem. International workshop on frontiers in handwriting recognition, IWFHR6, Taejon, Korea, 347-356. LECOLINET, E., AND CRETTEZ, J-P. (1991). A Grapheme-Based Segmentation Technique for Cursive Script Recognition. Proceedings of the 1st International Conference on Document Analysis and Recognition, St Malo, France, 740-748. LEE, L., AND COELHO, S. (2005). A simple and efficient method for global handwritten word recognition applied to Brazilian bank checks. Proceedings of the 8th International Conference on Document Analysis and Recognition, 950955. LIOLIOS, N., FAKOTAKIS, N., AND KOKKINAKIS, G. (2002). On the Generalization of the Form Identification and Skew Detection Problem. Pattern Recognition 35, 253264. LIU, J., AND GADER, P. (2002). Neural Networks with Enhanced Outlier Rejection Ability for Off-line Handwritten Word Recognition. Pattern Recognition, 35: 20612071. LIU, C.-L, NAKASHIMA, K., SAKO, H., AND FUJISAWA, H. (2002). Handwritten digit recognition using state-of-the-art techniques. In Proc. of 8th International Workshop on Frontiers of Handwriting Recognition, 320325.
101
LIU, C-L, NARUKAWA, K. (2004). Normalization Ensemble for Handwritten Character Recognition. 9th International Workshop on Frontiers of Handwriting Recognition, 69-74. LIU, C-L., AND FUJISAWA, H. (2005). Classification and learning for character recognition: Comparison of methods and remaining problems. Proceedings of the International Workshop on Neural Networks and Learning in Document Analysis and Recognition, 57. LONCARIC. (1998). A Survey of Shape Analysis Techniques. Pattern Recognition, Vol. 31(8), 983-1001. LORETTE (1999). Handwriting recognition or reading? What is the situation at the dawn of the 3rd millennium? International Journal on Document Analysis and Recognition, Vol. 2, 2-12. LU, Y. (1995). Machine printed character segmentation-An overview. Pattern Recognition, 28(1), 6780. LU, Y., AND SHRIDHAR, M. (1996). Character Segmentation in Handwritten Words - An Overview, Pattern Recognition vol. 29, 77-96. MADHVANATH, S., KLEINBERG, E., AND GOVINDARAJU, V. (1999). Holistic Verification of Handwritten Phrases. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21, 1344-1356. MADHVANATH, S., AND SHRIHARI, S. (1996). A Technique for Local Baseline Determination, Proceedings of the 5th International Workshop on Frontiers in Handwriting Recognition, 445-448. MADHVANATH, S., AND GOVINDARAJU, V. (2001).The Role of Holistic Paradigms in Handwritten Word Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (2). MAIER, M. (1986).Separating Characters in Scripted Documents, Proceedings of the 8th International Conference on Pattern Recognition, Paris, 1056-1058. MANTAS, J. (1986). An Overview of Character Recognition Methodologies, Pattern Recognition. 19, 425-430.
102
MARTI, U., AND BUNKE, H. (2001). Using a Statistical Language Model to improve the performance of an HMM-based Cursive Handwriting Recognition System. International Journal of Pattern Recognition and Artificial Intelligence.15 (1), 65-90. MARTI, U., AND BUNKE, H. (2002). The IAM database: An English Sentence Database for Off-line Handwriting Recognition. International Journal of Document Analysis and Recognition, 15, 65-90. MARTIN, G.L., RASHID, M., & PITTMAN, J.A. (1993). Integrated segmentation and recognition through exhaustive scans or learned saccadic jumps. International Journal on Pattern Recognition and Artificial Intelligence, 7(4), 831847. MARINAI, S., GORI, M., AND SODA, G. (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 2335. MILGRAM, J., CHERIET, M., SABOURIN, R. (2004). Speeding up the decision making of Support Vector Classifier. Proceedings of 9th International Workshop on Frontiers in Handwriting Recognition, 57-62. MOHAMED, M. A., AND GADER, P., (1996). Handwritten word recognition using segmentation-free hidden Morkov modeling and segmentation-based dynamic programming techniques. IEEE Transactions on Pattern Analysis and Machine intelligence, 18(5):548-554. MOHAMED M.A., AND GADER, P. (2000) Generalized Hidden Morkov Models- part ii: Application to handwritten word recognition. IEEE Transactions on Fuzzy Systems. (8), 82-94. MORI, S., SUEN, C., Y., AND YAMAMOTO, K. (1992). Historical Overview of OCR Research and Development, Proceedings of the IEEE, 80, 1029-1058. MORI, M., SUZUKI, A., SIHO, A., OHTSUKA, S. (2000). Generating new samples from handwritten numerals based on point correspondence. 7th International Workshop on Frontiers of Handwriting Recognition, 281-290.
103
MORITA, M., FACON, J., BORTOLOZZI, F., GARNES, S., AND SABOURIN, R. (1999) Mathematical morphology and weighted least squares to correct handwriting baseline skew, in Proceedings of the International Conference on Document Analysis and Recognition, vol. 1, Bangalore, pp.430433. MORITA, M., OLIVEIRA, L. S. SABOURIN, R. (2004). Unsupervised Feature Selection for Ensemble of Classifiers. IWFHR-9, pp.81-86. NADIA, A., AND NAJOUA, E. (2006) Combining a hybrid Approach for Features Selection and Hidden Markov Models in Multifont Arabic Characters Recognition. Proceedings of the Second International Conference on Document Image Analysis for Libraries (DIAL06), 103-107. NICCHIOTTI, G., SCAGLIOLA, C. AND RIMASSA, S. (2000). A Simple and Effective Cursive Word Segmentation Method. Proceedings of the 7th International Workshop on Frontiers in Handwriting Recognition, September, Amsterdam, ISBN 9076942-01-3, Nijmegen: International Unipen Foundation, pp. 499-504. NISHIMURA, M. KOBAYASHI, M. MARUYAMA, Y. NAKANO (1999). Offline character recognition using HMM by multiple directional feature extraction and voting with bagging algorithm. Proceedings of 5th International Conference on Document Analysis and Recognition, 4952. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2002). Automatic Recognition of Handwritten Numerical Strings: A Recognition and Verification Strategy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(11), 1438-1454. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2003a). A methodology for feature selection using multiobjective genetic algorithms for handwritten digit string recognition. International Journal on Pattern Recognition and Artificial Intelligence, 17(6):903-930. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2003b). Feature Selection for Ensembles: A Hierarchical
104
Multi-Objective Genetic Algorithm Approach, 7th International Conference on Document Analysis and Recognition, vol. 2, 676-680. OLIVEIRA, L. S., SABOURIN, R., BORTOLOZZI, F., SUEN, C. Y. (2003c). Impacts of verification on a numeral string recognition system. Pattern Recognition Letters, 24(7):10231031. OLIVEIRA, L., S., SABOURIN, R. (2004). Support Vector Machines for Handwritten Numerical String Recognition, 9th International Workshop on Frontiers in Handwriting Recognition, October 26-29, Kokubunji, Tokyo, Japan,3944.Computer Society Press. OLIVIER, C. PAQUET ,T., AVILA, M., AND LECOURTIER, Y. (1995). Recognition of handwritten words using Stochastic Models. International conference on document analysis and recognition, 19-23. OPTIZ, D.W. (1999). Feature Selection for Ensembles. 16th Int. Conf. on Artificial Intelligence, pages 379-384. OTSU, N. (1979). A Threshold Selection Method from Gray level Histograms, IEEE Trans. on Systems, Man and Cybernetics 9(1), 62-66. PAL, U., BELAID, A., CHOISY, C. (2003). Touching Numeral Segmentation using Water Reservoir Concept. Pattern Recognition Letters, 24:261-272. PARTRIDGE, D., YATES, W.B. (1996). Engineering Multiversion Neural-net Systems. Neural Computation, 8(4):869893. PASTER, M., TOSELLI, A., AND VIDAL, E. (2004). Projection Profile Based Algorithm for Slant Removal. Proceedings of the International Conference on Image analysis and Recognition, 183-190. PINALES RUIZ. J, JAIME-RIVAS, R, CASTRO, M.J (2007). Discriminative capacity of perceptual features in handwriting recognition. Telecommunications and Radio Engineering, 64 (11), 931-937. PLAMONDON, R., AND SRIHARI, S., N. (2000). On-Line and OffLine Handwriting Recognition: A Comprehensive Survey,
105
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 63-84. PUDIL, P., NOVOVICOVA, J., AND KITTLER, J.(1994). Floating search methods in handwritten material feature selection. Pattern Recognition Letters, vol. 15, 1119-1125. POSTL, W. (1986). Detection of Oblique Structures and Skew Scan in Digitized Documents. Proceedings of International Conference on Pattern Recognition. 687- 689. PROCTER, S., ELMS, A. J. (1998). The Recognition of Handwritten Digit Strings of Unknown Length using Hidden Markov Models. Proceedings of the Fourteenth International Conference on Pattern Recognition (ICPR'98), 1515-1517. PROCTER, S., AND ILLINGWORTH, J. (1999). Handwriting recognition using HMMs and a conservative level building algorithm. In Proc. 7th International Conference on Image Processing and its Applications. 736-739. Manchester. RABINER, L. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-286. RAMESH, D.,R., PIYUSH, M., K., AND MAHESH, D., D.(2006) Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition, Las Vegas, Nevada, USA, June 26-29, Vol 2, 510-515. SAMRAJYA, P., LAKSHMI, M., HANMANDLU AND SWAROOP, A. (2006). Segmentation of Cursive Handwritten Words using Hypergraph, 1-4, TENCON, IEEE region 10 Conference. SAYRE, K., M. (1973). Machine Recognition of Handwritten Words: A Project Report. Pattern Recognition, 5, 213-228. SCAGLIOLA, C., AND NICCHIOTTI, G (2000). Enhancing cursive word recognition performance by integration of all the available information. In Proc. 7th International Workshop on Frontiers in Handwriting Recognition, 363-372. Amsterdam. Netherlands.
106
SCHAMBACH, M.-P. (2005). Fast script word recognition with very large vocabulary. Proceedings of the 8th International Conference on Document Analysis and Recognition, 913. SENIOR, A., W. (1994). Off-Line Cursive Handwriting Recognition Using Recurrent Neural Networks, PhD Dissertation, University of Cambridge, England. SENIOR, A., W., AND ROBINSON, A., J. (1998). An Off-line Cursive Handwriting Recognition System, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 309-321. SENIOR, W., ROBINSON, A.J. (2002). An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):309-321. SERRA, J. (1982). Image Analysis and Mathematical Morphology, Academic Press, London. SHRIDHAR, M., KIMURA, F. (1995). Handwritten Address Interpretation using Word Recognition with and without Lexicon. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Piscataway, NJ, USA, Vol. 3, 2341-2346. SIMON, J., C. (1992). Off-Line Cursive Word Recognition, Proceedings of the IEEE, 80, 1150-1161. SIMONCINI, L., AND KOVACS-V, ZS. M. (1995). A System for Reading USA Census90 Hand-Written Fields. Proceedings of 3rd International Conference on Document Analysis and Recognition Vol. II. 8691. SIN, B.K. AND KIM, J.H. (1997). Ligature Modeling for Online Cursive Script Recognition, IEEE Trans. on PAMI 19(6), 623633. SINGH, S., AND AMIN, A. (1999). Neural network recognition of hand printed characters. Neural Computing & Applications, 8(1), 6776. SINHA, R., M., K., PRASADA, B., HOULE, G., AND SABOURIN, M (1993). Hybrid Contextual Text Recognition with String Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 15, 915- 925.
107
SRIHARI, S., N., AND LAM, S., W. (1995). Character Recognition, Technical Report, CEDAR-TR-95-1. STEINHERZ, T., RIVLIN, E., AND INTRATOR, N. (1999). Off-line Cursive Script Word Recognition- A survey. International Journal of Document Analysis and Recognition, Vol. 2, 90-110. STEVENS, M., E. (1961). Automatic Character Recognition: A State-of-the-Art Report, National Bureau of Standards. Unipen Foundation, 499-504, 2000 STEFANELLI, R. AND ROSENFELD, A. (1971). Some Parallel Thinning Algorithms for Digital Pictures, Journal of the Association for Computing Machinery ,vol. 18, 255-264. STEFANO, C., D., AND MARCELLI, A. (2002). From Ligatures to Characters: A Shape-based Algorithm for Handwriting Segmentation. Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition (IWFHR02), 473478. SUEN, C., Y., LEGAULT, R., NADAL, C., CHERIET, M., AND LAM, L. (1993). Building a New Generation of Handwriting Recognition Systems, Pattern Recognition Letters, 14, 305-315. SUEN, C.Y., AND TAN, J. (2005). Analysis of errors of handwritten digits made by a multitude of classifiers. Pattern Recognition Letters, 26(3), 369379. SUEN, C.Y., LEGAULT, R., NADAL, C., CHERIET, M., AND LAM, L. (1993). Building a new generation of handwriting recognition systems. Pattern Recognition Letters, 14(4), 305315. TAIRA, E., UCHIDA, S., AND SAKOE, H. (2004). Non-uniform slant correction for handwritten word recognition, IEICE Transactions on Information and Systems, vol. E87-D, no. 5,12471253. TAPPERT, C., C., SUEN, C, Y., AND WAKAHARA, T. (1990). The State of the Art in On-line Handwriting Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 8, 787-793. TAKAHASHI, T. GRIFFIN (1993). Recognition enhancement by linear tournament verification. Proceedings of 2nd
108
International Conference on Document Analysis and Recognition, 585-588. TAY, Y., H. (2002). Offline Handwriting Recognition using Artificial Neural Network and Hidden Morkov Model. PhD thesis, Page 78. TAY, Y., H., KHALID, M., YUSOF, R., AND GAUDIN C.,V. (2003) Offline Cursive Handwriting Recognition System based on Hybrid Markov Model and Neural Networks, Proceedings of IEEE International Symposium on Computational Intelligence in Robotics and Automation, Kobe, Japan, 1190-1195. TEOW, L., N., LOE, K., F. (2002). Robust vision-based features and classification schemes for off-line handwritten digit recognition. Pattern Recognition 35 (11), 2355-2364. TOMOYUKI, H., TAKUMA. A., BUNPEI. I. (2007) An Analytic Word Recognition Algorithm using a Posteriori Probability. Proceedings of the 9th International Conference on Document Analysis and Recognition, vol. 02, 669-673. TRIER, O.D., JAIN, A.K., TAXT, T. (1996). Feature Extraction Methods for Character recognition- a Survey. Pattern Recognition, vol 29 , no.4, 641-662. VALENTINI, DIETTERICH, T. G. (2002). Bias-Variance Analysis and Ensembles of SVM. 3rd International Workshop on Multiple Classifier Systems, 222-231. VARGA, T., AND BUNKE, H. (2003). Generation of Synthetic Training Data for an HMM-based Handwriting Recognition System. In Proceedings of the 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 618622. VINCIARELLI, A. (2002). A survey on off-line cursive word recognition. Pattern Recognition, 35(7), 14331446. VELOSO, L., R., SOUSA, R., P., DE AND CARVALHO, J., M. (2000). Morphological Cursive Word Segmentation. Symposium on Computer Graphics and Image Processing, 2000. XIII, Brazilian. 337-342. VERMA, B., AND BLUMENSTEIN, M. (1996). An Intelligent Neural System for a Robot to Recognize Printed and Handwritten
109
Postal Addresses. Proceedings of Fourth IASTED International Conference on Robotics and Manufacturing, IASTED RM96, Hawaii, USA, 80-84. VERMA, B., BLUMENSTEIN, M., AND KULKARNI, S. (1998). Recent Achievements in Off-line Handwriting Recognition Systems. Proceedings of the Second International Conference on Computational Intelligence and Multimedia Applications, (ICCIMA 98), Gippsland, Australia, 27-33. VERMA, B., GADER, P. (2000). Fusion of Multiple Handwritten Word Recognition Techniques. Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, vol.2, 926-934. VERMA, B.K., GADER, P., AND CHEN, W (2001). Fusion of Multiple Handwritten Word Recognition Techniques, Pattern Recognition Letters, 22(9), 991-998. VERMA, B. (2002). A Contour Character Extraction Approach in Conjunction with a Neural Confidence Fusion Technique for the Segmentation of Handwriting Recognition. Proceeding of the 9th International Conference on Neural Information Processing, vol. 5, 2459-2463. VERMA, B. (2003). A Contour Code Feature Based Segmentation for Handwriting Recognition. Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR03), 1203-1207. VIARD-GAUDIN, C., LALLICAN, P.-M., AND KNERR, S. (2005). Recognition-directed recovering of temporal information from handwriting images. Pattern Recognition Letters, 26(16), 25372548. VINCIARELLI, A. AND LUETTIN, J. (1999). Off-line cursive script recognition: IDIAP Research Report IDIAP-RR 00-43. VINCIARELLI, A. (2002). A survey on off-line cursive word recognition. Pattern Recognition, 35(7), 14331446. VUURPIJL, L., SCHOMAKER, L. VAN, M. (2003). Architectures for detecting and solving conflicts: two-stage classification and support vector classifiers, Int. Journal on Document Analysis and Recognition, 5(4):213-223.
110
WATANABE, M., M., HAMAMMOTO, Y., YASUDA, T., AND TOMITA, S. (1997) Normalization Techniques of Handwritten Numerals for Gabor Filters. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR IEEE, Los Alamitos, CA, Vol. 1, 303-307. WANG, D., SRIHARI, S. N. (1991). Analysis of Form Images. Proceeding of the International Conference on Document Analysis and Recognition, 181-186. WANG, L., WANG, X., AND FENG, J. (2006). On Image Matrix Based Feature Extraction Algorithms. IEEE Transactions on Systems Man and Cybernetics: Cybernetics Vol. 36(1), 194197. WANG, X., DING, X., AND LIU, C. (2005). Gabor filters based feature extraction for character recognition. Pattern Recognition, 38(3), 369379. WEN, Y., LU, Y., AND SHI, P. (2007). Handwritten Bangla numeral recognition system and its application to postal automation. Pattern Recognition, 40(1), 99107. XIAO, X., AND LEEDHAM, G, (2000). Knowledge-based English Cursive Script Segmentation, Pattern Recognition Letters, 21, 945-954. XU, Q., LAM, L., AND SUEN, C.Y. (2003). Automatic segmentation and recognition system for handwritten dates on Canadian bank cheque. In M. Fairhurst, and A. Downton (Eds.), Proceedings of the 7th International Conference on Document Analysis and Recognition, 704709. YAMADA, H., AND NAKANO, Y. (1996). Cursive Handwritten Word Recognition Using Multiple Segmentation Determined by Contour Analysis. IEICE Transactions on Information and Systems, E79-D, 464-470. YANIKOGLU, B., AND SANDON, P., A. (1998). Segmentation of OffLine Cursive Handwriting using Linear Programming, Pattern Recognition, 31, 1825-1833. ZHANG, D., AND LU, G., (2004). Review of shape representation and description techniques. Pattern Recognition, Vol. 37, 1-19.
111
ZHOU, J., GAN, Q., KRZYYZAK, A., SUEN. C-Y., (2000). Recognition and verification of touching handwritten numerals. 7th International Workshop on Frontiers of Handwriting Recognition, 179-188. ZIMMERMANN, M., BUNKE, H. (2002). Hidden Markov model length optimization for handwriting recognition systems. International Workshop on Frontiers in Handwriting Recognition, Niagara-on-the-Lakes, 369-374.
3
HIDDEN MARKOV MODELS (HMMs) APPROACH FOR HANDWRITING RECOGNITION
Muhammad Faisal Zafar Dzulkifli Mohamad
INTRODUCTION Hidden Markov models (HMMs) are widely used in the field of pattern recognition. Their original application was in speech recognition (Rabiner and Juang, 1993). Because of the similarities between speech and cursive handwriting recognition, HMMs have become very popular in handwriting recognition as well (Kundu, 1997). During the last few years, HMMs is a frequent approach used in handwriting recognition. One of the reasons is their higher performance in medium to large vocabulary applications where segmentationrecognition methods are used to cope with the difficulties of segmenting words into characters. Many systems use HMMs to model subword units (characters) and the Viterbi algorithm to find the best match between a sequence of observations and the models (Chen, 1995).
114
DEFINITION OF HIDDEN MARKOV MODEL The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution. Transitions among the states are governed by a set of probabilities called transition probabilities. In a particular state an outcome or observation can be generated, according to the associated probability distribution. It is only the outcome, not the state visible to an external observer and therefore states are hidden to the outside; hence the name Hidden Markov Model. An HMM consists of a set of states, and transition probabilities between those states. One or several of the states are defined as final states. For each state a likelihood value for each possible observation is defined (Simon, 2004). In order to define an HMM completely, following elements are needed. The number of states of the model, N. The number of observation symbols in the alphabet, M. If the observations are continuous then M is infinite. A set of state transition probabilities. A = {aij }, NxN with
aij = p{qt +1 = j qt = i}. 1 i,

where qt denotes the current state.
j N,
(1)
Transition probabilities should satisfy the normal stochastic constraints,
aij 0,1 i, j N
and
j =1
a ij = 1,1 i , j N
(2)
Hidden Markov Models (HMMs) Approach For Handwriting Recognition
115
A probability distribution in each of the states, B = {b j ( k )} NxM. (3)
b j (k ) = p{ot = vk qt = j},1 j N ,1 k M
where Vk denotes the kthobservation symbol in the alphabet, and Ot the current parameter vector. Following stochastic constraints must be satisfied.
b j (k ) 0,1 j N ,1 k M
and
k =1
b j (k ) = 1,1 j N
(4)
The initial state distribution,. = { i } where, (5)
i = p{q1 = i},1 i N
Therefore we can use the compact notation = (A, B, ) to denote an HMM with discrete probability distributions.
ASSUMPTIONS IN THE THEORY OF HMMs
For the sake of mathematical and computational tractability, following assumptions are made in the theory of HMMs.
116
The Markov assumption
As given in the definition of HMMs, transition probabilities are defined as,
aij = p{qt +1 = j qt = i}.
(6 )
In other words it is assumed that the next state is dependent only upon the current state. This is called the Markov assumption and the resulting model becomes actually a first order HMM. However, generally the next state may depend on past k states and it is possible to obtain such model, called an order HMM by defining the transition probabilities as follows.
ai1i2 ...ik j = p{qt +1 = j qt = i1 , qt 1 = i2 ,..., qt k +1 = ik }.1 i1 , i2 ,..., ik , j N

(7 ) It is seen that a higher order HMM will have a higher complexity. Even though the first order HMMs is the most common, some attempts have also been made to use the higher order HMMs.
The Stationary Assumption
Here it is assumed that state transition probabilities are independent of the actual time at which the transitions take place. Mathematically, it is formulated as
p{qt1 +1 = j qt1 = i} = p{qt2 +1 = j qt2 = i},
(8)
117
for any t1 and t2.
The Output Independence Assumption
This is the assumption that current output (observation) is statistically independent of the previous outputs (observations). We can formulate this assumption mathematically, by considering a sequence of observations,
O = o1 , o2 ,..., oT
Then according to the assumption for an HMM , 3
p{O q1 , q2 ,...qT , } = p(ot qt , ).

t =1
(9)
However, unlike the other two, this assumption has a very limited validity. In some cases this assumption may not be fair enough and therefore becomes a severe weakness of the HMMs.
THREE BASIC PROBLEMS OF HMMS
Once we have an HMM, there are three problems of interest.
118
The Evaluation Problem
Given
an
HMM
and
sequence
of
observations O = o1 , o2 ,..., oT , what is the probability that the observations are generated by the model, P {O | }.
The Decoding Problem
Given
model
and
sequence
of
observations
O = o1 , o2 ,..., oT , what is the most likely state sequence in the

model that produced the observations?
The Learning Problem
Given a model and a sequence of observations O = o1 , o2 ,..., oT , how should we adjust the model parameters {A, B, } in order to maximize P {O | }? Of these three basic problems, evaluation problem can be used for isolated (word) recognition while decoding problem is related to the continuous recognition as well as to the segmentation. The learning problem must be solved if we want to train an HMM for the subsequent use of recognition tasks. The following discusses the solutions for these problems.
119
The Evaluation Problem and the Forward Algorithm
We have a model = (A, B, ) and a sequence of observations O = o1 , o2 ,..., oT , and P{O|} must be found. We can calculate this quantity using simple probabilistic arguments. But this calculation involves number of operations in the order of NT. This is very large even if the length of the sequence, T is moderate. Therefore we have to look for another method for this calculation. Fortunately there exists one which has a considerably low complexity and makes use an auxiliary variable, t (i ) called forward variable. The forward variable is defined as the probability of the partial observation sequence o1 , o2 ,...,oT , when it terminates at the state i. Mathematically,
t ( i ) = p { o 1 , o 2 ,...., o t , q t = i | }
(10)
Then it is easy to see that following recursive relationship holds.
t +1 ( j ) = b j (ot +1 ) t (i )a ij ,1 j N ,1 t T 1
i =1
(11)
where,
1 ( j ) = j b j (o1 ),1 j N
Using this recursion we can calculate
T (i),1 i N
and then the required probability is given by,
p{O | } = T (i )
i =1
(12)
120
The complexity of this method, known as the forward algorithm is proportional to N2T, which is linear wrt T whereas the direct calculation mentioned earlier, had an exponential complexity. In a similar way we can define the backward variable t (i) as the probability of the partial observation sequence ot +1 , ot +1 ,...., oT , given that the current state is i. Mathematically, this is formulated as
t (i ) = p{ot +1 , ot + 2 ,...., oT | q t = i, }
(13)
As in the case of t (i ) there is a recursive relationship which can be used to calculate t (i ) efficiently.
t (i) = t +1 ( j )aij b j (ot +1 ),1 i N ,1 t T 1

j =1
(14)
where,
T (i ) = 1,1 i N
(15)
Further we can see that,
t (i) t (i) = p{O, qt = i | },1 i N ,1 t T
Therefore this gives another way to calculate P {O | }, by using both forward and backward variables as given in eq. 16.
p{O | } = p{O, q t = i | } =
i=i
i=i
(i) t ( i )
(16)
Eq. 16 is very useful, especially in deriving the formulas required for gradient based training.
121
The Decoding Problem and the Viterbi Algorithm
In this case we want to find the most likely state sequence for a given sequence of observations, O = o1 , o2 ,..., oT and a model, = (A, B, ). The solution to this problem depends upon the way most likely state sequence is defined. One approach is to find the most likely state qt at t=t and to concatenate all such 'qt 's. But sometimes this method does not give a physically meaningful state sequence. Therefore we would go for another method that has no such problems. In this method which is commonly known as Viterbi algorithm, the whole state sequence with the maximum likelihood is found. In order to facilitate the computation we define an auxiliary variable,
t (i) =
q1 ,q2 ,...,qt 1
max p{q , q ,...,q

1 2
t 1
, qt = i, o1 , o2 ,...,ot 1 | } (17)
which gives the highest probability that partial observation sequence and state sequence up to t=t can have, when the current state is i. It is easy to observe that the following recursive relationship holds.
t +1 ( j ) = b j (ot +1 ) max t (i )aij ,1 i N ,1 t T 1
1i N
(18)
where,
1 ( j ) = j b j (o1 ),1 j N
So the procedure to find the most likely state sequence starts from calculation of T ( j ),1 j N using recursion in eq. 18, while
122
always keeping a pointer to the ``winning state'' in the maximum finding operation. Finally the state j*, is found where j* = arg T ( j ), (19)
max
1 j N
and starting from this state, the sequence of states is back-tracked as the pointer in each state indicates. This gives the required set of states. This whole algorithm can be interpreted as a search in a graph whose nodes are formed by the states of the HMM in each of the time instant t, 1 t T. Generally, the learning problem is how to adjust the HMM parameters, so that the given set of observations (called the training set) is represented by the model in the best way for the intended application. Thus it would be clear that the quantity we wish to optimize during the learning process can be different from application to application. In other words there may be several optimization criteria for learning, out of which a suitable one is selected depending on the application. There are two main optimization criteria found in literature; Maximum Likelihood (ML) and Maximum Mutual Information (MMI). The solutions to the learning problem under each of those criteria are described below.
Maximum Likelihood (ML) Criterion
In ML we try to maximize the probability of a given sequence of observations Ow, belonging to a given class w, given the HMM w of the class w, w.r.t. the parameters of the model w. This probability is the total likelihood of the observations and can be expressed mathematically as L tot = P {O w | w } . However since we consider only one class w at a time we can drop the subscript and superscript 'w's. Then the ML criterion can be given as, L tot = P {O | } (20)
123
However there is no known way to analytically solve for the model = (A, B, ), which maximize the quantity Ltot. But we can choose model parameters such that it is locally maximized, using an iterative procedure, like Baum-Welch method or a gradient based method, which are described below.
Baum-Welch Algorithm
This method can be derived using simple occurrence counting arguments or using calculus to maximize the auxiliary quantity
Q( , } = p{q O, } log[ p{o, q, }]

q
(21)
over . A special feature of the algorithm is the guaranteed convergence. To describe the Baum-Welch algorithm, (also known as Forward-Backward algorithm), we need to define two more auxiliary variables, in addition to the forward and backward variables defined in a previous section. These variables can however be expressed in terms of the forward and backward variables. First one of those variables is defined as the probability of being in state i at t=t and in state j at t=t+1. Formally, t (i, j ) = p{qt = i, qt +1 = j O, } (22) This is the same as,
t (i, j ) =
p{q t = i, qt +1 = j O, } p{O }
(23)
Using forward and backward variables this can be expressed as,
124
t (i, j ) =
t (i )aij t +1 ( j )b j (ot +1 )
(i)a
i =1 j =1 t
(4.12)
ij
t +1 ( j )b j (ot +1 )
The second variable is the a posteriori probability,
t (i ) = p{qt = i O, }
(24)
that is the probability of being in state i at t=t, given the observation sequence and the model. In forward and backward variables this can be expressed by,
(i ) (i ) t t (i ) = N t (i ) (i ) t t i =1
(25)
One can see that the relationship between t (i ) and t (i, j ) is given by,
t (i) = t (i, j ),1 i N ,1 t M

j =1
(26)
Now it is possible to describe the Baum-Welch learning process, where parameters of the HMM is updated in such a way to maximize the quantity, P{O | }. Assuming a starting model = ( A, B, ) ,we calculate the ' 's and ' 's using the recursions eq 12 and 14, and then ' 's and ' 's using eq 23 and 26. Next step is to update the HMM parameters according to eqs 27 to 29, known as re-estimation formulas.
i = t (i),1 i N ,
(27)

T 1
125
a ij =
(i, j )
t =1 T 1 t =1 t
,1 i N ,1 j N
(28)
(i )
b j (k ) =
t =1 ot = vk T
t =1
( j) ,1 j N ,1 k M (29) ( j)
These re-estimation formulas can easily be modified to deal with the continuous density case too. Re-estimation is an iterative process. First, we initialize = (A, B, ) with a best guess or, if no reasonable guess is available, we choose random values such that i 1/N and aij 1/N and bj (k) 1/M . Its critical that A, B and be randomized, since exactly uniform values result in a local maximum from which the model cannot climb. As always, , A and B must be row stochastic. 1. Initialize, = (A, B, ). 2. Compute t (i), t (i), t (i, j) and t (i). 3. Re-estimate the model = (A, B, ). 4. If P (O | ) increases, goto 2. Of course, it might be desirable to stop if P (O | ) does not increase by at least some predetermined threshold and/or to set a maximum number of iterations.
126
Gradient Based Method
In the gradient based method, any parameter updated according to the standard formula,
of the HMM is
J new = old =old
(30)
where J is a quantity to be minimized. We define in this case,
J = E ML = log ( p{O }) = log( Ltot )
(31)
Since the minimization of J = EML is equivalent to the maximization of Ltot, eq. 30 yields the required optimization J criterion, ML. But the problem is to find the derivative for any parameter of the model. This can be easily done by relating J to model parameters via Ltot. As a key step to do so, using the eqs.16 and 20 we can obtain,
Ltot = p{O, qt = i | } = t (i ) t (i )
i =1 i =1
(32)
Differentiating the last equality
1 L tot J = L tot
Eq. 33 gives
(33)
L J , if we know tot which can be found using eq. 32. However this derivative is specific to the actual parameter concerned. Since there are two main parameter sets in the HMM, namely transition probabilities aij ,1 i, j N and observation
127
probabilities derivative
b j (k ),1 j N ,1 k M , we can find the
Ltot for each of the parameter sets and hence the J gradient, gradient w.r.t. transition probabilities. Using the chain rule,
T Ltot Ltot t ( j ) = aij aij t =1 t ( j )
(34)
By differentiating eq.4.21 wrt t ( j ) we get,

Ltot = t ( j) aij
(35)
and differentiating (a time shifted version of) Eq 4.2 w.r.t. aij

t ( j ) = b j (ot ) t 1 (i ) aij
(36)
Eqs. 34-36 give,
Ltot , and substituting this quantity in Eq. 33 aij
(keeping in mind that = aij in this case), we get the required result,
J 1 = Ltot aij
( j )b (o )
t =1 t j t
t 1
(i )
(37)
gradient w.r.t. observation probabilities. Using the chain rule,
128
Ltot Ltot t ( j ) = b j (ot ) t ( j ) b j (ot )
(38)
Differentiating (a time shifted version of) the eqn.11 w.r.t. b j (ot )
t ( j) t ( j) = b j (o t ) b j (o t )
(39)
Finally we get the required probability, by substituting for Ltot in Eq 33 (keeping in mind that = b j (ot ) in this case), b j (ot ) which is obtained by substituting Eqns. 39 and 35 in Eq. 38.
J 1 t ( j) t ( j) = b j (ot ) Ltot b j (ot )
(40)
Usually this is given the following form, by first substituting for Ltot from Eq. 32 and then substituting from Eq. 25
( j) J = t b j (ot ) b j ( ot )
(41)
The same method can be used to propagate the derivative (if necessary) to a front end processor of the HMM.
Dynamic Programming and HMM Before completing our discussion of the elementary aspects of HMMs, we make a brief detour to show the relationship between dynamic programming (DP) and HMMs. In fact, a DP can be viewed as an -pass where sum is replaced by max.
129
More precisely, for , A and B as above, the dynamic programming algorithm can be stated as 1. Let 0(i) = i bi (O0 ), for i = 0, 1, . . . , N 1. 2. For t = 1, 2, . . . , T 1 and i = 0, 1, . . . , N 1, compute
t (i ) =
j{0 ,1,... N 1}
max [
t 1
( j )a ji bi (ot )
(42)
At each successive t, the DP determines the probability of the best path ending at each of the states i = 0, 1, . . . , N 1. Consequently, the probability of the best overall path is
j{o ,1,... N 1}
max [
T 1
( j )]
(43)
Be sure to note that eq 43 only gives the optimal probability, not the optimal path itself. By keeping track of each preceding state, the DP procedure can be used to recover the optimal path by tracing back from the highest-scoring final state. The dynamic programming algorithm only needs to maintain the highest-scoring paths at each possible statenot a list of all possible paths. This is the key to the efficiency of the DP algorithm. Underflow is a concern with a dynamic programming problem of this form, since we compute products of probabilities. Fortunately, underflow is easily avoided by simply taking logarithms. The underflow-resistant DP algorithm is 1. Let 0 (i) = log[ i (O0 )] , for i = 0, 1, . . . , N 1. 2. For t = 1, 2, . . . , T 1 and i = 0, 1, . . . , N 1, compute
130
t (i ) =
j{0 ,1,... N 1}
max [
t 1
( j ) + log[a ji ] + log[bi (ot )]
(44)
In this case, the optimal score is
j{o ,1,... N 1}
max [
T 1
( j)
(45)
Of course, additional bookkeeping is still required in order find the optimal path.
HMM Scaling
The HMM solutions discussed all require computations involving products of probabilities. It is easy to see, for example, that t (i) tends to 0 exponentially as T increases. Therefore, any attempt to implement the formula as given above will inevitably result in underflow. The solution to this underflow problem is to scale the numbers. However, care must be taken to insure that, for example, the re-estimation formula remain valid. First, consider the computation of t (i). The basic recurrence is
t (i ) = t 1 ( j )a ji bi (ot )
j =0
N 1
(46)
It seems sensible to normalize each t (i) by dividing by the sum (over j) of t (j). However, we must verify that the re-estimation formula holds.
131
~ For t = 0, let 0 (i ) = 0 (i ) for i = 0, 1, . . . , N 1. Then ~ ~ let c 0 = 1 N 1 ~ and, finally, 0 (i ) = c 0 0 (i) for i = 0, 0 ( j) j 0

1, . . . , N 1. Then for each t = 1, 2, . . . , T 1 do the following. 1. For i = 0, 1, . . . , N 1, compute
~ ~ 0 t (i ) = t 1 ( j )a ji bi (ot )
j =0 N 1
(47)
2. Let
ct =
1 ~ t ( j)
j =0 N 1
(48)
3.
For i = 0, 1, . . . , N 1, compute
~ t (i) = ctt (i)

~ Clearly, 0 (i ) = c0 0 (i ) . Suppose that
t ( i ) = c 0 c 1 .... c t t ( i )
Then
(49)
~ t +1 (i) = ct +1 t +1 (i)
N 1 j =0
(50)
= ct +1 t ( j )a ji bi (ot +1 )
132
= c0 c1 ....ct ct +1 t ( j )a ji bi (ot +1 )
j =0
N 1
= c0 c1 ....ct +1 t +1 (i )
and hence (49) holds, by induction, for all t.
~ From (49) and the definitions of and it follows that
t (i) =
t (i )
( j)
j =0 t
N 1
(51)
and hence t (i ) are the desired scaled values of t (i ) for all t. As a consequence of (51),
j =0
N 1
T 1
( j) = 1
(52)
Also, from (49) we have
T 1 ( j ) = c0 c1 ....cT 1 T 1 ( j )
j =0 j =0
N 1
N 1
(53)
= c0 c1 ....cT 1 P{O | } Combining these results gives P (O | ) = 1 (54)

j
c
j =0
T 1
133
To avoid underflow, we instead compute

log[P (O | )] = log c j
j =0 T 1
(55)
The same scale factor is used for t (i) as was used for t (i), namely ct , so that t (i ) = ct t (i ) . We then compute t (i, j ) and
t (i) using the formula of the previous section with t (i)

and t (i ) in place of t (i ) and t (i ) , respectively. These values are then used to re-estimate , A and B. By writing the original re-estimation formula (27) and (28) and (29) directly in terms of t (i ) and t (i ) , it is an easy exercise to show that the re-estimated and A and B are exact when t (i ) and t (i ) in place of t (i ) and t (i ) . Furthermore, P(O | ) isnt required in the re-estimation formula, since in each case it cancels in the numerator and denominator. Therefore, (4.35) can be used to verify that P (O | ) is increasing at each iteration. Fortunately, we have no need for the actual value of P(O|), the calculation of which would inevitably result in underflow.
Pseudo-code
Here we give complete pseudo-code for solving Problem 3, including scaling. This pseudo- code also provides everything needed to solve Problems 1 and 2.
134
The values N and M are fixed and the T observations
O = o0 , o1 ,...,oT 1
are assumed known. 1. Initialization: Select initial values for the matrices A, B and , where is 1 N , while A={aij } is N N and B = {bj (k)} is N M , and all three matrices are row-stochastic. If known, use reasonable approximations for the matrix values, otherwise let i 1/N and aij 1/N and bj(k)1/M . Be sure that each row sums to 1 and the elements of each matrix are not uniform. Let maxIters = maximum number of re-estimation iterations iters = 0 oldLogProb = . 2. The -pass // compute 0(i) c0 = 0 for i = 0 to N 1 0 (i) = i bi (o0 ) c0 = c0 + 0 (i) next i // scale the 0 (i) c0 = 1/c0 for i = 0 to N 1 0(i) = c0 0 (i) next i // compute t (i) for t = 1 to T 1 ct = 0
135
for i = 0 to N 1 t (i) = 0 for j = 0 to N 1 t (i) = t (i) + t-1 (j)aji next j t (i) = t (i) bi (Ot ) ct = ct + t (i) next i // scale t (i) ct = 1/ ct for i = 0 to N 1 t (i) = ct t (i) next i next t 3. The -pass // Let T 1 (i) = 1 scaled by cT 1 for i = 0 to N 1 T 1 (i) = cT 1 next i // -pass for t = T 2 to 0 by 1 for i = 0 to N 1 t (i) = 0 for j = 0 to N 1 t (i) = t (i) + aij bj (Ot+1 )T +1 (j) next j // scale t (i) with same scale factor as t (i) t (i) = ct t (i) next i next t 4. Compute t (i, j) and t (i)
136
for t = 0 to T 2 denom = 0 for i = 0 to N 1 for j = 0 to N 1 denom = denom + t (i)aij bj (Ot+1 )T +1 (j) next j next i for i = 0 to N 1 t (i) = 0 for j = 0 to N 1 t (i, j) = (t (i)aij bj (Ot+1 )T +1 (j))/denom t (i) = t (i) + t (i, j) next j next i next t 5. Re-estimate A, B and // re-estimate for i = 0 to N 1 i = 0 (i) next i // re-estimate A for i = 0 to N 1 for j = 0 to N 1 numer = 0 denom = 0 for t = 0 to T 2 numer = numer + t (i, j) denom = denom + t (i) next t aij = numer/denom next j next i
137
// re-estimate B for i = 0 to N 1 for j = 0 to M 1 numer = 0 denom = 0 for t = 0 to T 2 if(Ot == j) then numer = numer + t (i) end if denom = denom + t (i) next t bi (j) = numer/denom next j next i 6. Compute log[P (O | )] logProb = 0 for i = 0 to T 1 logProb = logProb + log(ci ) next i logProb = logProb 7. To iterate or not to iterate, that is the question. . . iters = iters + 1 if (iters < maxIters and logProb > oldLogProb) then oldLogProb = logProb goto 2 else output = (, A, B) end if
138
SUMMARY
The success of HMMs in speech recognition has led many researchers to apply them to handwriting recognition by representing each word image as a sequence of observations. In fact, the problem of large vocabulary handwriting recognition is turned into an optimisation problem that consists of evaluating all the possible solutions and choosing the best one, that is, the solution that is optimal under certain criteria. The main problem is that the number of possible hypotheses grows as a function of the lexicon size and the number of sub-word units, and that imposes formidable computation requirements on the implementation of search algorithms (Chen et al., 1994). When using HMMs for a classification problem, an individual HMM is designed for each pattern class. For each observation sequence, i.e. for each sequence of feature vectors, the likelihood that this sequence was produced by an HMM of a class can be calculated. The class which HMM achieved the highest likelihood is considered as the class that produced the actual sequence of observations (Simon, 2004). There are two possibilities for defining the likelihood of an observation sequence for a given HMM. Either the highest likelihood of all possible state sequences is used (Viterbi recognition), or the sum of the likelihood of all possible state sequences is considered as the likelihood of the observation sequence (Baum-Welch recognition). The Viterbi algorithm is optimal in the sense of maximum likelihood and it looks at the match of the whole sequence of features (observations) before deciding on the most likely state sequence. This is particularly valuable in applications such as handwritten word recognition where an intermediate character may be garbled or lost, but the overall sense of the word may be detectable. On the other hand, the local information is somewhat overlooked. Furthermore, the conditional independence imposed by the Markov Model (each observation is independent of its
139
neighbors) prevents an HMM from taking full advantage of the correlation that exists among the observations of a single character (Alessandro et al., 2002). Hidden Markov Models (HMMs), which can be thought of as a generalisation of dynamic programming techniques, have become the predominant approach to automatic speech recognition (Ney and Ortmanns, 1999). The HMM is a parametric modelling technique, in contrast with the non-parametric DP algorithm. The power of the HMM lies in the fact that the parameters that are used to model the handwriting signal can be well optimised, and this results in lower computational complexity in the decoding procedure, as well as improved recognition accuracy (Koerich, 2003). Furthermore, other knowledge sources can also be represented with the same structure, which is one of the important advantages of Hidden Markov Modelling (Huang et al., 1990).
REFERENCES
ALESSANDRO L., KOERICH Y. L., ROBERT S., CHING Y. S., 2002. A Hybrid Large Vocabulary Handwritten Word Recognition System using Neural Networks with Hidden Markov Models. Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR02) IEEE. CHEN M.Y, KUNDU A., AND SRIHARI S. N., 1995. Variable duration hidden markov model and morphological segmentation for handwritten word recognition. IEEE Transactions on Image Processing, vol. 4, no.12, pp.16751688. HUANG X, ARIKI Y, JACK MA., 1990. Hidden Markov Models for Speech Recognition. Edinburgh University Press. KUNDU A., 1997. Handwritten word recognition using hidden Markov model. In: H. Bunke and P. Wang, Editors, Handbook of character recognition and document image analysis, World Scientific, Singapore, pp. 15718.
140
NEY H, ORTMANNS S., 1999. Dynamic programming search for continuous speech recognition. IEEE Signal Processing Magazine, vol.16, no.5, pp.6483. RABINER, L., JUANG, B., 1993. Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs. SIMON GUNTER. (2004). Multiple Classifier Systems in On-line Cursive Handwriting Recognition. PhD thesis, Universiti at Bern. SIMON G. AND HORST B., 2005. Off-line cursive handwriting recognition using multiple classifier systems -on the influence of vocabulary, ensemble, and training set size. Optics and Lasers in Engineering,vol. 43, Issues 3-5, pp. 437-454.
4
SEGMENTATION OF BRAIN MR IMAGES
M. Masroor Ahmed Dzulkifli Mohamad
INTRODUCTION
Segmentation of images holds an important position in the area of image processing. It becomes more important while typically dealing with medical images where pre-surgery and post surgery decisions are required for the purpose of initiating and speeding up the recovery process (noodle.med.yale.edu 1993). Computer aided detection of abnormal growth of tissues is primarily motivated by the necessity of achieving maximum possible accuracy. Manual segmentation of these abnormal tissues cannot be compared with modern days high speed computing machines which enable us to visually observe the volume and location of unwanted tissues. A well known segmentation problem within MRI is the task of labeling voxels according to their tissue type which include White Matter (WM), Grey Matter (GM), Cerebrospinal Fluid (CSF) and sometimes pathological tissues like tumor etc. This chapter describes an efficient method for automatic brain tumor segmentation for the extraction of tumor tissues from MR images. It combines Perona and Malik (1990) anisotropic diffusion model for image enhancement and Kmeans clustering technique for grouping tissues belonging to a specific group. The proposed
142
Advances in Image Processing and Pattern Recogition: Algorithms and Practise
method uses T1, T2 and PD weighted gray level intensity images. The proposed technique produced appreciative results. The developments in the application of information technology have completely changed the world. The obvious reason for the introduction of computer systems is: reliability, accuracy, simplicity and ease of use. Besides, the customization and optimization features of a computer system stand among the major driving forces in adopting and subsequently strengthening the computer aided systems. In medical imaging, an image is captured, digitized and processed for doing segmentation and for extracting important information. Manual segmentation is an alternate method for segmenting an image. This method is not only tedious and time consuming, but also produces inaccurate results. Segmentation by experts is variable (Wareld S. et al., 1995). Therefore, there is a strong need to have some efficient computer based system that accurately defines the boundaries of brain tissues along with minimizing the chances of user interaction with the system (Matthew C. Clark, 1994). Additionally, manual segmentation process requires at least three hours to complete. (Mancas M. et al., 2005). According to Dong-yong Dai et al. (1993) the traditional methods for measuring tumor volumes are not reliable and are error sensitive.
PREVIOUS WORK
Various segmentation methods have been cited in the literature for improving the segmentation processes and for introducing maximum possible reliability, for example:
Segmentation of Brain MR Images
143
Segmentation by Thresholding
Thresholding method is frequently used for image segmentation. This is simple and effective segmentation method for images with different intensities. The technique basically attempts for finding a threshold value, which enables the classification of pixels into different categories. A major weakness of this segmentation mode is that: it generates only two classes. Therefore, this method fails to deal with multichannel images. Beside, it also ignores the spatial characteristics due to which an image becomes noise sensitive and undergoes intensity in-homogeneity problem, which are expected to be found in MRI. Both these features create the possibility for corrupting the histogram of the image. For overcoming these problems various versions of thresholding technique have been introduced that segments medical images by using the information based on local intensities and connectivity (Dzung L. Pham et al. 1998). Though this is a simple technique, still there are some factors that can complicate the thresholding operation, for example, non-stationary and correlated noise, ambient illumination, busyness of gray levels within the object and its background, inadequate contrast, and object size not commensurate with the scene. (Sezgin M. et al., 2004 and Chowdhury, M.H. et al., 1995) introduced a new image thresholding method based on the divergence function. In this method, the objective function is constructed using the divergence function between the classes, the object and the background. The required threshold is found where this divergence function shows a global minimum.
Region Growing Method
According to Zhou J. et al., (2005) due to high reliability and accurate measurement of the dimensions and location of tumor,
144
MRI is frequently used for observing brain pathologies. Previously, region growing and shape based methods were heavily relied upon for observing the brain pathologies. Zhigeng Pan and Jianfeng Lu, (2007) proposed a Bayes-based region growing algorithm that estimates parameters by studying characteristics in local regions and constructs the Bayes factor as a classifying criterion. The technique is not fully automatic, i.e. it requires user interaction for the selection of a seed and secondly the method fails in producing acceptable results in a natural image. It only works in homogeneous areas. Since this technique is noise sensitive, therefore, the extracted regions might have holes or even some discontinuities (Dzung L. Pham et al. 1998). Shape based method provides an alternative approach for the segmentation of brain tumor. But the degree of freedom for application of this method is limited too. The algorithm demands an initial contour plan for extracting the region of interest. Therefore, like region growing approach, this method is also semi automatic. Both of these methods are error sensitive because, an improper or false description of initial plan and wrong selection of the seed image will lead to disastrous results. Statistical methods and fuzzy logic approaches seems to be reliable and are the best candidates for the replacement of the above mentioned techniques.
Supervised and Un-Supervised Segmentation Methods
Supervised and un-supervised methods for image processing are frequently applied (Matthew C. Clark, 1994; Guillermo N. and Virginia L., 2005). Bezdek J.C., (1993) presents a technically detailed review of these techniques. Velthuizen RP et al., (1995) attempted to segment the volume as a whole using KNN and both hard and fuzzy c-means clustering. Results showed, however, that there appears to be enough data non-uniformity between slices to prevent satisfactory segmentation. Supervised classification
145
enables us to have sufficient known pixels to generate representative parameters for each class of interest. In an unsupervised classification pre hand knowledge of classes is not required. It usually employs some clustering algorithm for classifying an image data. According to Guillemo N. Abras and Virginia L. Ballarin, (2005) window classifiers are supervised classification algorithm. Whereas, un-supervised classification algorithm includes: K-Means, minimum distance, maximum distance and hierarchical clustering etc.
METHODOLOGY
A brain Image consists of four regions i.e. gray matter (GM), white matter (WM), cerebrospinal fluid (CSF) and background. These regions can be considered as four different classes. Therefore, an input image needs to be divided into these four classes. In order to avoid the chances of misclassification, the outer eleptical shaped object should be removed. By removing this object we will get rid of non brain tissues and will be left with only soft tissues. In this experiment we have used T1, T2 and PD weighted brain MRIs. These images posses same size and same pixel intensity values. The pixels from the image under consideration are supposed to be grouped in any one of the aforementioned class. Finally, by applying certain post processing operations, the tumerous region can be extracted. Figure 1 shows the methodology of this work. The process uses Kmeans algorithm for solving clustering problem this algorithm aims at minimizing an objective function, in this case a squared error function. Mathematically, this objective function can be represented as: J = || x i
j =1 i =1 k x ( j)
c j || 2
(1)
146
where || x i ( j ) c j || 2 is a chosen distance measure between a data point xi(j) and the cluster centre cj, is an indicator of the distance of the n data points from their respective cluster centres. Image is read from database. The image contains the skull tissues. These tissues are non brain elements. Therefore, they should be removed in the preprocessing step. The presence os these tissues might lead to misclassification. Figure 2 shows an image of the brain with skull seen as an outer eliptical ring. In Figure 3 this elitical ring is removed and we are left with only soft tissues. This is done by employing the following morphological function, i.e. erosion and dilation. Mathematically, these functions can be expressed as: A B = {w: Bw A} A B = Ax
x B
(2) (3)
147
Read Image Database
Preprocessing Image Enhancement Skull Stripping
Classification
Post processing Extracting WM, GM, CSF and Tumor Morphological Operations Accumulation of Tumor Volume
Figure 1
Methodology
148
Figure 2
Image with Outer Ring (Skull)
Figure 3
Removing Skull Tissues
To test the algorithm, white guassian noise is added to the input image. This image is then processed for enhancement. Perona and Malik (1990) model is used for this purpose. This model uses partial differential equation for image denoising and enhancement. Figure 4 and 5 shows an image with noise and the enhanced image respectively. The model smooths the image without loosing important details with the help of following mathematical relation (Izquierdo, E. and Li-Qun Xu, 2000).
It = .[ f ( x, y, t ) I ] = div( f ( x, y, y ) I )
(4)
I(x, y, t) is the intensity value of a pixel at sampling position (x, y) and scale t and f(x, y, t) is the diffusivity acting on the system. The diffusivity function in Peronan and Malik mode is given by the folwing mathematical relation f ( x, y, t ) = f (|| ||2 ) = 1 1 + || ||2 / k 2 (5)
149
Figure 4
Noisy Image
Figure 5
Enhanced Image
RESULTS AND CONSLUSION
It has been observed that when Perona and Malik (1990) model is combined with Kmeans algorithm, it produces reliable results. Due to un-supervised nature of the approach, the proposed system is efficient and is less error sensitive.
Figure 6
(a) Original Image (b) Skull Removed (c) Segmented mage (d) Extracting WM (e) WM after Intensity Correction (f) Extracting GM (g) GM after Intensity Correction (h) Removing
150
It can be deduced from the results that un-supervised segmentation methods are better than the supervised segmentation methods. Because for using supervised segmentation method a lot of pre-processing is needed. More importantly, the supervised segmentation method requires considerable amount of training and testing data which comparatively complicates the process, whereas, this study can be applied to the minimal amount of data with reliable results. However, it may be noted that, the use of K-Means clustering method is fairly simple when compared with frequently used fuzzy clustering methods. Efficiency and providing simple output are fundamental features of K-Means clustering method (Dmitriy Fradkin and Ilya Muchnik, 2004).To check the accuracy of the proposed method, mean and standard deviations of clean image, noisy image containing white guassian noise and enhanced image is drawn in Figure 7. Figure 7 shows some results from an image enhanced by Perona-Malik anisotropic diffusion model and results from an image corrupted with with guassian noise. There is a significant difference in both the results. Tumor extracted from a noisy image marks various portions of the MR slice which even contain the normal tissues. The results obtained from enhanced image and the clean images are almost similar. The accuracy of the proposed method can be deduced from Figure 8 in which mean and standard deviations of MR image in various combinations is shown. Due to very less amount of noise, mean and standard deviations plotted in Figure 8 (d) shows almost the same range.
151
d
Figure 7
(a) Deleting normal tissues from enhanced MRI slice (b) Segmentation of enhanced MRI slice (c) Extraction of tumor (d) Noisy image showing only normal tissues (e) Segmentation of noisy image (f) Deleting normal tissues and retaining tumor cells
152
c
Figure 8
(a) Mean and Standard Deviations of Clean, Noisy and Enhanced Image (b) Mean and Standard Deviations of Noisy and Enhanced Image (c) Mean and Standard Deviations of clean and Enhanced Image (d) Mean and Standard Deviations of clean and Noisy Image
REFERENCES
BEZDEK J. C., HALL L. O., CLARKE L. P., 1993. Review of MR image segmentation techniques using pattern recognition, Medical Physics vol. 20, no. 4, pp. 1033. CHOWDHURY M.H, LITTLE W.D, 1995. Image thresholding techniques, IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, Proceedings. pp.585 589. DMITRIY FRADKIN, ILYA MUCHNIK, 2004. A Study of K-Means Clustering for Improving Classification Accuracy of Multi-
153
Class SVM. Technical Report. Rutgers University, New Brunswick, New Jersey 08854, 2004. DONG-YONG DAI, CONDON, B., HADLEY, D., RAMPLING, R., TEASDALE, G., 1993. Intracranial deformation caused by brain tumors: assessment of 3-D surface by magnetic resonance imaging,IEEE Transactions on Medical Imaging, vol. 12, issue 4, pp.693 702 DZUNG L. PHAM, CHENYANG XU, JERRY L.PRINCE, 1998. A Survey of Current Methods in Medical Medical Image Segmentation Technical Report JHU / ECE 99-01, Department of Electrical and Computer Engineering. The Johns Hopkins University, Baltimore MD 21218. GUILLERMO N. ABRAS AND VIRGINIA L. BALLARIN, 2005. A Weighted K-means Algorithm applied to Brain Tissue Classification, JCS&T vol. 5, no. 3. IZQUIERDO E., LI-QUN XU, 2000.Image segmentation using datamodulated nonlinear diffusion, Electronics Letters, vol.36, issue 21, pp.1767 1769. MANCAS M., GOSSELIN B., MACQ B., 2005, "Segmentation Using a Region Growing Thresholding", Proc. of the Electronic Imaging Conference of the International Society for Optical Imaging (SPIE/EI 2005), San Jose (California, USA). MATTHEW C. CLARK, 1994. Segmenting MRI Volumes of the Brain with Knowledge- Based Clustering MS Thesis, Department of Computer Science and Engineering, University of South Florida, 1994 PAN ZHIGENG, LU JIANFENG, 2007. A Bayes-Based RegionGrowing Algorithm for Medical Image Segmentation, Computing in Science & Engineering, vol.9, issue 4, pp.32 38. PERONA, P., MALIK, J., 1990. Scale-space and edge detection using anisotropic diffusion, Pattern Analysis and Machine Intelligence, IEEE Transactions, vol.12, Issue 7, pp. 629 639. SEZGIN M., SANKUR B., 2004. Survey over image thresholding techniques and quantitative performance evaluation.J. Electron. Imaging vol. 13, no.1, pp. 146-165.
154
VELTHUIZEN RP, CLARKE LP, PHUPHANICH S, HALL LO, BENSAID AM, ARRINGTON JA, GREENBERG HM AND SILBIGER ML, 1995. Unsupervised Tumor Volume Measurement Using Magnetic Resonance Brain Images, Journal of Magnetic Resonance Imaging , vol. 5, no. 5, pp. 594-605. WARELD S., DENGLER J., ZAERS J., GUTTMANN C., GIL W., ETTINGER J., HILLER J., AND KIKINIS R., 1995. Automatic Identication of Grey Matter Structures from MRI to Improve the Segmentation of White Matter Lesions. Journal of Image Guided Surgery, vol.1, no.6, pp.326-338. YALE Image Processing and Analysis Group, 1993. <http://noodle.med.yale.edu> ZHOU J., CHAN K.L., CHONG V.F.H., KRISHNAN S.M., 2005. Extraction of Brain Tumor from MR Images Using One-Class Support Vector Machine, 27th Annual International Conference of the Engineering in Medicine and Biology Society, IEEE-EMBS, pp.6411 6414.
5
TISSUES SEGMENTATION OF BRAIN MR IMAGES BY UTILIZING ARTIFICIAL NEURAL NETWORK
M. Masroor Ahmed Dzulkifli Mohamad
INTRODUCTION
The anatomical structure of human brain is very complex. For diagnosing the presence of certain possible pathology, some times the surgeons have to navigate through each and every pixel to ascertain its possible association with a specific group of tissues. Due to the volume of MRIs because of its various views i.e. coronal, axial and sagital, practically, it is extremely difficult to carryout human visual inspection for deciding about the health of brain. Therefore, an efficient automatic system is need of the hour to perform this job with the highest possible degree of reliability and accuracy. This chapter discusses the use of Artificial Neural Network (ANN) for the segmentation of brain MR images. Unlike CT and X-Rays, MRI due to its flexibility for segmenting various important tissues has assumed a common form of medical imaging modality. Additionally this is an efficient segmentation method which further significantly contributes in tissue visualization (Tian and Fan, 2007). In the presence of various segmentation methods, like region growing, region merging, thresholding, watershed etc extend most valuable and vital information for medical experts in executing a diagnostic
156 Algorithms and Practice
Advances in Image Processing and Pattern Recogition:
process for observing the presence or the absence of certain abnormality in human brain. The observation collected due to execution of these well known methods serves two important purposes: it contributes in approximating the tumor size and volume, and secondly. It helps in determining the change occurred in tumor volume after surgical treatment (Mona, L et al., 2003). However the performance of these segmentation methods confirms the attribution of some fundamental shortcomings. For example, intensity in-homogeneity caused due to imaging parameters set for the acquisition of MR images. Due to this defect, the intensity ranges that are obtained are not reliable and there is every possibility that they will represent wrong class or area. Additionally, the distinctiveness in the tissue boundaries is lost due to the partial volume effect. Due to the sensivity of these problems, the commonly used intensity based methods fails in properly classifying the MR images (Shi Juan He et al., 2000). Since the inception of MR imaging for medical research and examination a number of MR image segmentation methods were researched on and developed. Out of all these methods, the AI techniques, like ANN due to its inherent capability of self learning, fault tolerance and optimum search, became the focal point for researchers. On top of it, the ANN is competent enough to exercise an effective control on various uncertainties for example temperature and electrical noise, magnetic field inhomogeniety, individual differences and disordered suppression of tissues (Wei Sun and Yaonan Wang, 2005). Bayesian classification method also holds an important position in artificial intelligence based segmentation approaches. This method carries the possibility for having more than one material in a voxel and the information drawn on the basis of adjacency phenomenon make distinctive contribution in the classification process (Laidlaw et al., 1994). (Li CL et al., 1993) made a straight forward use of Fuzzy C-Means algorithm to solve the segmentation problem of brain MR Images. However, it may be noted that, the FCM based approach has some intensity unbalance. This is quite evident that, if a candidate image has some
Tissues Segmentation of Brain MR Images by Utilizing Artificial Neural Network
157
intensity non uniformity in it, then the algorithm is expected to produce erroneous results. Therefore, it is always needed that there should be a balance in the intensity of the images. Concentrating on this fact (Pham DL and Prince JL, 1999) researched on traditional FCM for controlling intensity inhomogeniety, which an MR image is likely to get during acquisition process. Like Bayesian and Artificial Neural Network, Expectation Maximization (EM) algorithm is another potential candidate for achieving the objective of image segmentation (Pham DL and Prince JL, 1999). The EM algorithm was initially introduced by Wells. A major point which attracted the attention of researcher to refine the EM based segmentation method is, its non availability of spatial information. (Kapur T, 1999) successfully introduced spatial information in traditional EM algorithm.
1 4 6
2 X 7
3 5 8
1 4 6
2 X 7
3 5 8
1 4 6
2 X 7
3 5 8
Figure 1 Neighborhood connectivity,adjacent, diagonal and 3 x 3
Similarly image segmentation can also be achieved by region growing method. This method revolves around the acquisition of information from neighboring pixels. For getting this information, it makes use of adjacency phenomenon as shown in Figure 1. The decision is made on the basis of the extracted information i.e. if the intensity value of the neighboring pixels carries a close value or similar value, the region keeps on expanding. This expansion process gets halted on finding some different values. The algorithm then organizes the pixels having similar values, and marks them as one class. The process keeps on iterating until it scans the whole
image. A noticeable shortcoming of region based segmentation method is that it needs a seed pixel which is generally marked manually. Another important point which may bring a question mark over this method is that: brain has very complex structure, what would be the reliability of this type of segmentation method if the seed pixel is wrongly chosen (Qussay and Abdul Rahman, 2001). In addition to the above mentioned segmentation methods, deformable contour models are also employed to achieve this objective. This approach primarily concentrates on processing of internal energy and the external energy and continues its processing until it acquires a balance in both of these energy values. The internal energy is responsible for bringing smoothness of the curve, whereas, the external energy gives us local region statistics, such as first and second order moments of pixel intensity (Payel and Melanie, 2006).
METHODOLOGY
The architecture of the network includes a total of eighteen feature vectors, nine features from each class with ten hidden layers and three output layers. Figure 2 shows the architecture of the network used for the segmentation of the brain MR images.
Back-Propagation Artificial Neural Network Classifier
Due to the recursive and iterative process in back-propagation algorithm, there is possibility of receiving some errors. In order to
159
quantify that error ratio, generally, the delta rule for minimizing the error is employed. This rule says that: Ei =
1 2
(x
j
ij
oij ) 2
(1)
According to this equation, the calculation of error term can be obtained by finding the difference between the desired outputs and the actual observed outputs. These differences are added together and are multiplied by 0.5 to get the actual error term. In the light of this concept, if we see the above equation, xij represents the desired output and oij represents the observed output. The possibility for getting a big error term cannot be ruled it. If it were, then the network is likely to produce unreliable results. Therefore, some remedial measure is needed to protect the program from being getting executed with large error terms. Generally, this difference is controlled and minimized by adjusting the weights in the network. And this objective can be achieved by the implementation of gradient descent method in the generalized delta rule (Freeman and Skapura, 1991).
i
w jk = ij oik
(2)
Here is the represents the rate at which the network learns and either represents the hidden node or an output node, i.e. for hidden node we have:
ij = f (netij ) p ip wpj
whereas, for an out put node we have:
(3)
ij = ( xij oij ) f (net ij )
(4)
Figure 2
Schematic Diagram for ANN
It may be noted here that netij is infact a combination of product of weights and the observed outputs along with the addition of a biased term j. Therefore, for netij we can have the following relation.
161
netij = p w jk oik + j
(5)
The following relation gives us the output when certain input is applied. For example when input k is applied then the output received at node j will be as follows:
oij = f j (netij )
(6)
This is quite evident that for getting an out put from neural network, we do need an activation function. In the above mentioned relation, fj activation function. In this study, logsig activation function is employed. There is no technical reason for using this specific activation function. Mathematical representation of this activation function is as follows:
fx = 1 1 + exp 1
(7)
With the presence of the above mentioned activation function, equation (3) and (4) assumes the following shape:
ij = oij (1 oij ) p ip w pj
(8)
The above equation represents ij for hidden node. Whereas, for an output node, ij has the following equation. Last but not least, a momentum term is added into the learning rate thereby producing modified weights.
RESULT
163
REFERENCE
ALEXANDRA LAURIC AND SARAH FRISKEN, "Soft Segmentation of CT Brain Data" Tufts University, Halligan Hall Room 102, 161 College Ave, Medford MA 02155, USA. J.A. Freeman and D.M. Skapura, 1991. Neural Networks, Algorithms, Applications and programming Techniques. Addison-Wesley Publishing Company. KAPUR T., 1999. Model based three dimensional medical image segmentation. In Ph.D. thesis, Massachusetts Institute of Technology. L. MONA, F. LAMBERTI, C. DEMARTINI, 2003. "A Neural Network Approach to Unsupervised Segmentation of Single-Channel MR Images", Proceedings of the 1st International IEEE EMBS Conference on Neural Engineering Capri Island, Italy March 20-22. LAIDLAW DH, FLEISCHER KW, BARR AH., 1994. Classification of Material Mixtures in Volume Data for Visualization and Modeling. Technical Report CS-TR-94-07, California Institute of Technology, Pasadena, CA 91125. LI CL, GOLDGOF DB, HALL LO., 1993. Knowledge-based classification and tissue labeling of MR images of human brain. IEEE Trans. Med. Imag. 12 (1993) 740750. PAYEL GHOSH; MELANIE MITCHELL, 2006. "Segmentation of medical images using a genetic algorithm" Proceedings of the 8th annual conference on Genetic and evolutionary computation Seattle, Washington, USA, Pages: 1171 1178. PHAM DL, PRINCE JL., 1999. Adaptive fuzzy segmentation of magnetic resonance images. IEEE Trans. Med. Imag. 18.737 752. QUSSAY A. SALIH, ABDUL RAHMAN RAMLI, 2001. "Region based Segmentation technique and algorithms for 3D Image" International Symposium on Signal Processing and its
Applications (ISSPA), Kuala Lumpur, Malaysia, 13 - 16 August. SHI JUAN HE; XIA WENG; YAMEI YANG; WEILI YAN, 2000." MRI brain images segmentation " The 2000 IEEE Asia-Pacific Conference on Circuits and Systems, 2000. IEEE APCCAS 2000. 4-6 Dec.Page(s):113 116. TIAN, DAN AND FAN, LINAN, 2007. A Brain MR Images Segmentation Method Based on SOM Neural Network, The 1st International Conference on Bioinformatics and Biomedical Engineering, ICBBE 2007. pp.686 689. WEI SUN, YAONAN WANG, 2005."Segmentation Method of MRI Using Fuzzy Gaussian Basis Neural Network" Neural Information Processing Letters and Reviews. Vol.8, No.2. WELLS III WM, GRIMSON WEL, KIKINIS R, JOLESZ FA., 1996.Adaptive segmentation of MRI data. IEEE Trans. Med. Imag. 15 (1996) 429442.
6
AUTOMATIC SEGMENTATION OF WHITE MATTER LESION MRI BRAIN
Novanto Yudistira Daud Daman
INTRODUCTION
Even if MRI (Magnetic Resonance Imaging) is put very highly among medical image diagnosis in neurology, we are often faced with the problem of differentiating tumor boundaries from normal tissue segmentation, and evaluating variation of lesions based often on T1 or T2 contrast. Thus, it would be of challenging to have an objective framework or methodology for improving the evaluation of brain tumor for example in the white matter lesion analysis, as well as treatment monitoring/follow-up, surgery and radiation therapy planning. The use of image processing techniques such as Texture Analysis (TA) have so far provided better result of accuracy in characterization of MRI brain tissue, especially by utilizing the Co-occurrence Matrix (COM). We propose an automatic WML segmentation approach which uses a combination of image analysis and pattern recognition methods. There are three main steps in our approach, as summarized in Figure 2.
PRE-PROCESSING
The multi-features of MRI (Magnetic Resonance Imaging) images obtained from the same dataset were co-registered, in order to compensate for possible motion between layers. There will be three dimensions x,y,z to connect each pixel to another. The registration phase must be required to match between one layer to the neighboring layers. The channel of images is often in T1 and T2 contrast. This method hopefully will improve the evaluation of MRI brain tumor.
Figure 1
One layer of MRI brain dataset
Automatic Segmentation of White Matter Lesion MRI Brain
167
3 D Co-occurrence matrices texture analysis
Segmentation by means SVM
Evaluation
Figure 2
The methodology process
Figure 3
The layers conjugation view from MRI brain dataset
TEXTURE ANALYSIS 3D CO-OCCURRENCE MATRIX (3D-COM)
Two-dimensional (2D) TA (Texture Analysis) methods are not enough to reveal the characteristics in brain tumor MR Image analysis. 2D-Texture Analysis only figures out a single twodimensional layer of a three-dimensional structure over an image dataset collection. A new approach of Texture Analysis by COM (Co-occurrence Matrices) is utilized in this research that wishes to improve the result by transform method from 2D to 3D approach estimation. The 3D approach is expected to increase the sensitivity and specificity of brain tumor characterization. The new matrix is calculated over several contiguous layers along the classic
169
rectangular X- and Y-axes, and along the Z-axis, in order to evaluate between layers probabilities. A preliminary comparative evaluation is carried out between 2D and 3D-COM applied to MRI tumor brain for characterization of solid tumor, necrosis and edema, or on surrounding White Matter (WM) to discriminate different volumes according to their distances from the tumor (D. Mohmoud et al, 2003).
Figure 4
Collected slices/layers, each pixel is connected with the neighboring pixels (D. Mohmoud et al, 2003).
The two-dimensional Co-occurrence matrix (2D-COM) provides statistical density function P(i,j) that has a purpose to find a joint relationship between the pair of neighboring pixels with each pixel is i and j. Those neighboring pixels are separated in
distance d (pixel distance) and angle of the values (0, 45, 90, and 135). The probability density function Pd,_(i, j), that is the probability of finding a joint relationship between a pair of pixels composed of a central pixel of gray level i and a neighboring one of gray level j. These two pixels are separated by a distance d (pixel distance) and angle of one of the four values (0, 45, 90 and 135).
90
45
135
Pixel of interest
Figure 5
Pixel features directions
The co-occurrence matrix characterizes the spatial interrelationships of the gray tones via feature vectors in MRI brain image. The value of co-occurrence matrix presents the weight that separating the two neighboring pixels by distance d and at angle , one of those pixels has gray level i and j, their joint probability of occurrence are Pij.
171
The feature vectors to be included in estimations are: a. Contrast The contrast is used to measure the structural variations sharpness in the MRI image. Contrast =
P(i, j )
i =1 j =1
Ng Ng
(1)
b. Inverse difference moment It is used to calculate local homogeneity. Inverse difference moment =
1 + (i j )
i =1 j =1
Ng Ng
P(i, j )
(2)
c. Correlation To measure gray level linear dependency of the image.
ijP(i, j )
Correlation =
i =1 j =1
Ng Ng
col
row
(3)
col row
d. Homogeneity Homogeneity measures the increase of the less contrast in the window. Homogeneity =
1 + i j
i =1 j =1
Pi , j
(4)
e. Variance Variance is informing how much the gray levels are varying from the mean. Variance =
(i, j )P
i =1 j =1
i, j
(5)
f. Cluster tendency Cluster tendency is measuring how many gray levels emerge in the MR image can be classified. Cluster tendency =
(i + j 2 ) P
i =1 j =1
i, j
(6)
The feature vectors are computed for each non-background voxel of each subject in medical MRI image space. In order to include spatial information from the surrounding of each voxel and make feature vectors different in identifying WMLs, each feature vector includes not only local intensities of a corresponding voxel in four modality images, but also intensities of neighboring voxels in four modality images hence will be suitable to take samples for example like 3X3 primitive textures. Normalization of primitive texture cell feature is compulsory to build accurate result of the texture characteristic of the each primitive template. If we normalize the matrix P by the total number of pixels so that each element is between 0 and 1, we obtain a grey-level co-occurrence matrix C. Normalization norm (Pi ) = Pi is subject to maximum singular value of ( Pi )
P i = Pi / norm( Pi )
(7)
173
SEGMENTATION UTILIZING SVM
SVM (Support Vector Machines) are an approximation of structural risk minimization. This general concept is based on the fact that the error rate of a learning machine on a test data is bounded by the sum of the training-error rate and a term depending on Vapnik-Chervonenkis dimension. SVM excels in classifying data-points as compared to many other classifiers as they dont assume anything about the priors and are discriminative in nature. They also are scalable and hence dont suffer from curse of dimensionality or constructing the separable hyperplane in that spatial space. The algorithm for segmentation on the textural features of MR medical image characteristic is done by utilizing SVM. The different types of brain tissue has the different textural features. Tissue like tumor, white matter, gray matter and CSF (Cerebrospinal Fluid) in the brain have different MR image texture and because of it, could be represented by the texture primitive that could analyze the complete MR image dataset. The SVM model that has been constructed from training samples, is used to perform the voxel-wise segmentation. False positive voxels are further eliminated for producing relative accurate WMLs segmentation results. For the evaluation, a set of training samples were manually segmented by neuroradiologists, and then will be compared with a classification model that is built based on the manual segmentation results through SVM.
CONCLUSIONS
This 3D approach of texture analysis could contribute to the development of 3D MR images of tumor heterogeneity based on texture parameters. It also needs to be tested by further combining
T2-weighted MRI, contrast enhanced dynamic MRI, spectroscopic imaging and diffusion tensor imaging, while applying these techniques to a wider range of patients. In all cases, 3D-COM either improves tumor characterization comparing to 2D-COM, or demonstrates textural overlapping that validates 2D-COM results. This higher accuracy results from weighting the texture characteristics over a larger 3D volume enhance the dominant texture characteristics of 3D-COM Texture analysis.
REFERENCE
A. QUDDUS, P. FIEGUTH, O. BASIR. Adaboost and Support Vector Machines for White Matter Lesion Segmentation in MR Images. IEEE. 2005 D. MAHMOUD-GHONEIM, G. TOUSSAINT, JEAN-MARC CONSTANT, J.D. DE CARTEINS. Three dimensional texture analysis in MRI: a preliminary evaluation in gliomas. Elsevier. 2003 N. SHARMA, A. K. RAY, S. SHARMA, K. K. SHUKLA, S. PRADHAN, L. M. ANGGARWAL. Segmentation and Classification of medical images using texture-primitive features: Application of BAMtype artificial neural network. Journal of Medical Physic, Vol. 33, No. 3. 2008 R. M. HARALICK, K. SHANMUGAM, I. DINSTEIN. Textural Features for Image Classification. IEEE. 1973 Z. LAO, D. SHEN, A. JAWAD, B. KARACALI, D. LIU, E. R. MELHEM, R. NICK BAYAN, C. DAVATZIKOS. Automated Segmentation of White Matter Lesions in 3D Brain MR Images, Using Multivariate Pattern Classification. IEEE.2006
7
FINGERPRINT IMAGE RECONSTRUCTION AND ENHANCEMENT
Ghazali Sulong Mohamad Kharulli Othman Khairul Azlan Ali
INTRODUCTION
Fingerprint identification is one of the most important biometric technologies which has drawn a substantial amount of attention over the past decade. Progress has been made on several areas including fingerprints enhancement, identification, classification, matching, and the development of commercial automated fingerprint identification systems. Despite these advances, there remain considerable opportunities for improvement. The retrieval speed and the ability to recognize partial or low quality fingerprint images are prominent among those areas that require improvement. A fingerprint is a graphical pattern of ridges and valleys on the surface of a human finger. Due to the uniqueness and permanence of fingerprints, there are among the most reliable human characteristics that can be used for personal identification. The uniqueness of a fingerprint is determined by the local ridge characteristics and their relationships. The two most prominent features are the ridge ending and the ridge bifurcation, called
176
Advances in Image Processing and Pattern Recogition: Algorithms and Practice
minutiae. Most Automatic Fingerprint Identification System (AFIS) are based on minutiae matching. A good quality fingerprint typically contains about 40 to 100 minutiae. The performance of the fingerprint identification and verification systems relies heavily on the quality of the input fingerprint images. In an ideal fingerprint image, ridges and valleys alternate and flow in a locally constant direction. In such situations, the ridges can be easily detected and minutiae can be precisely located from the thinned ridges. However, in practice, due to variations in impression conditions, ridge configuration, skin conditions (aberrant formations of epidermal ridges of fingerprints, postnatal marks, occupational marks), and acquisition devices is of poor quality. The ridge structures in poor-quality fingerprint images are not always well-defined and, hence, they cannot be correctly detected. This leads to the following problems: (1) a significant number of fake minutiae may be created, (2) large percentage of genuine minutiae may be ignored, and (3) large error in their position and orientation may be introduced. Therefore, a critical step in fingerprint identification is to correctly extract the minutiae. However, minutiae extraction is an error prone process which heavily relies on the quality of fingerprint images. A poor-quality image badly affects the minutiae extraction and ridge orientations, and consequently will produce a large percentage of fake minutiae and eventually lead to errors in fingerprint matching. Therefore, in order to construct an effective fingerprint identification system, an effective enhancement algorithm is necessary. This chapter discusses an approach for fingerprint image enhancement and reconstruction especially suitable for low quality fingerprint images, which can improve the clarity and continuity of the ridge structures. There are two steps involved: (1) directional image computing, and (2) fingerprint image filtering.
Fingerprint Image Reconstruction and Enhancement
177
DIRECTIONAL IMAGE
Directional image is a discrete matrix whose elements represent the local average directions of the fingerprint ridge lines. The orientation field of fingerprint image represents an intrinsic nature of fingerprint images. It defines invariant coordinates for ridges and furrow in local neighbourhood, which is very important in fingerprint analysis. The accuracy of the directional image is very critical and plays a very significant role in fingerprint recognition, if the directional image had a fine and accurate element, the enhanced fingerprint image therefore, is at its best quality and vice versa. Hong et. al (1998) developed an Iterated Least Mean Square Estimation Algorithm. They defined directional image; as NxN image, in which (i, j) represents local ridge orientation at pixel (i, j). Usually the local ridge orientation specifies the region (block) rather than every pixel. The image is divided into wxw non-overlapping blocks and, single local ridge orientation is defined for each block. The main steps of the algorithm are as followed: 1. Divide the input fingerprint image into blocks size wxw . For 500 dpi images, the initial value of w is 16. 2. Compute the gradients x(i , j) and y(i , j) at each pixel (i , j). The gradient operator may vary from the simple Sobel operator to the more complex Marr-Hildreth operator (D. Marr, 1982), depending on the computational requirement. 3. Estimate the local orientation of each block centered at pixel (i , j) by using the following equations (A. Rao, 1990).
178
(1)
(2)
(3) where (i,j) is the least square estimate of the local ridge orientation at the block centered at pixel (i , j). Mathematically, it represents the direction of orthogonal to the dominant direction of Fourier spectrum of the wxw window. 4. Due to the presence of noise, corrupted ridge and valley structures, minutiae, and etc in the input image, the estimated local ridge orientation, (i,j), may not always be correct. Local ridge orientation varies slowly in a local neighbourhood in which there is no singular points appear. A Low-Pass Filter can be used to modify the incorrect local ridge orientation. In order to perform the Low-Pass Filtering, the orientation image needs to be converted into continuous vector field, which is defined as follows: (4)
(5)
179
where x and y are the components of the vector fields, respectively. Thus, the Low-Pass Filtering can then be performed as follows:
(6)
(7) where h(u,v) is a 2-dimensional low-pass filter with unit integral specifies the size of the filter. Note that smoothing and operation is performed at the block level. The default size of the filter is 5x5. 5. Compute the local ridge orientation at (i ,j ) using equation below:
(8) Stock and Swonger (1969) proposed Directional Mask. It was initially used to change a gray scale image to a binary image, which was converted into a detector of ridge orientation at each pixel. Approaches that utilize a scheme similar to Stock and Swonger (1969) includes Candela et al. (1995), Karu and Jain (1996), Capelli et al. (1998).
180
As shown in Figure 1a, it is an example of a directional mask used in Candela et al. (1995). It has a size of 99, centered at C. This mask can estimate eight directions for each pixel. For each pixel in the image, the slit sums si ( i = 18) are computed. Each si is the sum of the values of the slit of four pixels labeled i in the equations below. Let p 0 and q 7 such that: (9) (10) The direction at a pixel is defined as p if the centre pixel is located on a ridge (dark area) and as q if the centre pixel is located in a valley. If the centre pixel has a value, C, then its direction is given by: (11)
181
Figure 1
Directional masks: a) 9x9 directional mask and, b) 5x 5 directional masks. Note that each matrix element represents an angle
Meltem Ballan et al. (1997), and Suliman M. Mohamed and Henry (2002) used a smaller variant of the directional mask (Figure 1b). Alternatively, the orientation of each pixel is estimated by taking the minimum absolute difference of the slits. An example in calculating the direction using slit sums is shown below. Firstly, the image is partitioned into blocks of 55 pixels. The procedure for calculating the pixel orientation is as follow: (a) Calculate the slit sums for each direction. In this case, only four slit sums are calculated.
182
(b)
Find the minimum and maximum of slit sums.
(c)
Solve for the inequality of equation (1.10).
(d)
Since left side is larger than the right side, which means the directional angle is 0.
The downside of this technique is the fixed possible direction, for instance four or eight directions.
FINGERPRINT FILTERING
Noises always exist in digital images. There are two noise removal approaches namely, Spatial Domain and Frequency Domain. The term Spatial Domain refers to the aggregate of pixels composing an image. Spatial Domain methods are procedures that operate directly on these pixels value. There are three categories in Spatial Domain, i.e. Smoothing Filter, Sharpening Filter and Histogram Modelling.
183
Smoothing Filter is used for blurring and reducing noise. Blurring is a common pre-processing step to remove small details in an image. The example of Smoothing Filter techniques are Average, Minimum and Maximum Filter. Sharpening Filter is used to highlight fine details or to enhance the details of fingerprint images. It seeks to emphasize changes of the original images. The examples of Sharpening Filter are High-Pass and High Boost Filter. Moving on, low-contrast images often occur due to poor or none uniform lighting conditions or nonlinearity or small dynamic range of the imaging sensor. One way to resolve this problem is to use Histogram Modelling. The histogram of an image represents the relative frequency of occurrence of the various grey levels in the image. Histogram is a technique to modify an image so that its histogram can have the desired shape. This is useful in stretching the low-contrast levels of images with narrow histograms. Histogram Equalization and Histogram Stretching is a technique based on Histogram Modelling. Gonzales et al. (1992) defines enhancement in Frequency Domain as follows: (1) the Fourier transform of image is obtained, then (2) the result of transformation by a filter function is multiplied, and (3) the Inverse transform is taken to produce the enhanced image. Sherlock et. al (1992) introduced a fingerprint enhancement algorithm for automatic fingerprint identification. It is based on a directional Fourier domain 18 filtering algorithm, which determines local Ridge Orientation (LRO) in 16 direction (i= i /16, i = 0,..,15 ). Fast Fourier Transform (FFT) is used to filter fingerprint into 16 directions. A directional band pass filter is used for this filtering. Once 16 pre-filtered fingerprint images are obtained through Inverse Fast Fourier Transform (IFFT), those 16 images are combined together to get filtered image as follows; the LRO is determined at each point by interpolation between 30 x 30 samples from raw image data. Then filtered images are formed by an appropriate combination of pixel values of pre-filtered images.
184
DeLaRue Printrak (Printrak 1985) developed a fingerprint pre-processing filter using local Fast Fourier Transform (FFT) techniques. In this technique, they divided the fingerprint image into 32 by 32 tiles starting from the upper left hand corner. The filter then processes each tiles. Once a tile is processed, the filter shifts to the right 24 pixels to obtain the next 32 by 32 tile. In the new tile, the first 8 columns of the tile becomes the last 8 columns of the previous tile. After reaching the right side of the image, the filter shifts down 24 pixels. In the new tile, the first 8 rows of the tile becomes the last 8 rows of the previous vertically adjacent tile. The process continues until the whole image is processed in this manner. Each tile is filtered individually by FFT. The FFT of each tile is computed and the lowest and the highest ranges of spatial frequencies are set to zero. Then the power spectrum of the FFT is computed as: (12) is the real part of the FFT and is the Where imaginary part of the FFT. The element of P is raised to a power of and multiplied by FFT elements X+ iY to obtain new elements U+iV as follows: (13) (14) To obtain the filtered tile, the inverse FFT of U+ iV is computed. The real part is used to reconstruct the filtered tile. In the image reconstruction, the centre 24x24 pixels is saved and the outer 4 rows and columns are discarded from each filtered 32x32 pixel tile. Owing to the nature of fingerprints, scars are inevitable. This can cause the introduction of fake minutiae in the feature extraction process. In order to prevent this from happening, the
185
directional image is obtained and modified. By using the 320 x 240 image, Wahab et. al (1998) divided the image into 40 x 30 small areas containing 8 x 8 pixels. Each region is assigned a directional code to represent the ridge orientation in that area. In reducing the computational time, a total of eight directional codes are used. The eight directional windows with a length of 16 pixels are shown in Figure 2.
Figure 2
Eight directional windows (Wahab et. al, 1998)
In finding ridge direction, each directional window wd is moved in the direction tangential to the direction of the window. Since each small areas of 8x8 pixels is used, each of wd must be moved 8 time to cover the entire area. In each area in which wd is moved, the mean value M(wd) of gray level is calculated. The fluctuation of M(wd) is expected to be the largest when the movement of the directional windows is orthogonal to the direction of the ridges. Therefore, this area is assigned to have ridges in the direction d such that the fluctuation of M(wd) is the largest. Because of the noises and scars in fingerprint, some area in directional image contains incorrect directional codes. In order to
186
fix this problem, directional codes in this area are modified by referring to directional codes in neighbouring areas. This is achieved by getting a directional histogram N(d) of an area and its neighbours. Note that N(d) is the number of areas with directional codes, d. The largest value is identified as D1 , and the second largest value is identified as D1. D(x,y) is the modified directional codes of the pixel (x ,y). The criteria to modify the directional code are: i. ii. D(x,y) = D1 if 5 N(D1) 8 D(x,y) = |(D1 + D2) / 2|, if 3 N(D1) 5 and 3 N(D1) 5 and |( D1 - D2)| 2 D(x,y) = D(x,y) otherwise.
iii.
The directional image is then reconstructed by moving the selected directional window through the central area and, replacing the pixels with that of the directional windows. In every positions of the shift, the pixels is added if a continuation of the ridge line is detected. It is held by comparing the pixels of the directional window with those of the neighbouring areas. Otherwise, it remains as empty space as to signify the gap between the ridge lines. Ikonomopoulos et al 1(984, 1985) and Kunt et. al (1985) introduced a Directional Filtering Approach for texture discrimination. In their work, they have isolated the edge information that belongs to a limited number of directions by using a directional filter. Its frequency responses cover a set of frequencies, which are within a directional range. They have used as the directional filter. In creating a new fingerprint image, there are two steps: 1) Frequency Transformation, and 2) Directional Fourier Filtering. Fourier spectrum is used to improve fingerprints images because ridges and valleys alternate and flow in a locally parallel direction that can be represented by frequency-like components (Figure 3).
187
Figure 3
Frequency-like components of Ridges and furrows (Hong, 1998)
In transforming fingerprint image into frequency domain, Fast Fourier Transformation (FFT) is used. Figure 4 below shows a fingerprint image in frequency domains. By modifying frequency component of an image, periodic noise can be reduced and the image can be enhanced. The general definition of directional filter is: (15) H is the filter function, W is the Fourier transform of fingerprint image and G is the ith of the directional filter. Below show the filter function, H definition is given by: (16)
188
Figure 4
Image transformations
(17)
189
The explanation of this equation is described as below: 1. u and v denote the spatial frequency coordinates in plane-x and plane-y 2. n denotes the number of directions 3. 4. = +( 1) / 2n 5. denotes the ideal frequency response. Figure 5 shows the ideal frequency response in defined which the directional filter by Ikonomopoulos et al.(1984).
Figure 5
A directional filter developed by Ikonomopoulos et. al (1984)
Where
190
(18) By using this filter, the frequency image is filtered into eight directions each. The eight directions are obtained from the directional image in the previous stage. After the filtering process, the frequency image is transformed back into Spatial Domain by using Inverse Fast Fourier Transformation and a new fingerprint image with a fine quality is obtained. All the process is summarized in Figure 6. Figure 7 shows that the noise effect caused by sweat holes and non- uniform ridges and furrows width is improved except for the scars effects (green box). The failure of scar removal is caused by wrong elements in the directional image (Figure 8).
CONCLUSIONS
The above results have shown that the noise effects caused by sweat spots and non-uniform ridges and furrows width are significantly improved. However, the techniques failed to remove the scars. This is due to wrong elements in the directional image. This problem could be solved if a perfect directional image were used. However, this is a very challenging task which requires great effort and determination. This idea is open for future researches.
191
Figure 6 A Fingerprint image reconstruction process
192
Figure 7 below shows the result of fingerprint image reconstruction.
Figure 7
Result from fingerprint image reconstruction. a) original fingerprint, b) reconstructed fingerprint
193
Figure 8
Directional Image, a) Original fingerprint image, b) Directional Image Mapping Reference
REFERENCE
CANDELA, G.T., GROTHER, P.J., WATSON, C.I., WILKINSON, R.A., AND WILSON, C.L.(1995). PCASYS A Pattern-Level Classification Automation System for Fingerprints. Technical Report. NISTIR 5647. CAPPELLI, C., LUMINI, A., MAIO, D. AND MALTONI, D. (1999). Fingerprint Classfication by Directional Image Partitioning.
194
IEEE Transaction on Pattern Analysis and Machine Intelligence. 21(5): 402421. DELARUE PRINTRAK INC. (1985). Automated Classification System Reader Project (ACS). Technical Report, February. GONZALEZ, R.C. AND WOODS, R.E. (1992). Digital Image Processing. Addison-Wesley Publishing Company. HONG, L. WAN, Y. AND JAIN, A. (1997a). Fingerprints Image Enhancement; Algorithm and Performance Evaluation. Pattern Recognition and Image Processing Laboratory, Department of Computer Science. HONG, L, (1998). Automatic Personal Identification Using Fingerprints. Ph.D Dissertation, Michigan State University IKONOMOPOULOS, A., UNSER, M. (1984). A Directional Filtering Approach to TextureDiscrimination. Proceedings of the Seventh International Conference on Pattern Recognition. Montreal. Canada. 30 July-2August. pp. 87-89. IKONOMOPOLUS, A., KUNT, M. (1985). High Compression Image Coding via Directional Filtering. Signal Processing 8 NorthHolland. pp. 179-203. IKONOMOPOLUS, A., KUNT, M. (1985). High Compression Image Coding via Directional Filtering. Signal Processing 8 NorthHolland. pp. 179-203. KARU, K. AND JAIN, A.K. (1996). Fingerprint Classification. Pattern Recognition.29(3): 389404. KUNT, M., IKONOMOPOULOS, A., KOCHER, M. (1985). SecondGeneration Image Coding Techniques. Proceedings of IEEE vol. 73, No 4, pp. 549-574, April 1985. STOCK, R.M. AND SWONGER, C.W. (1969). Development and evaluation of a reader of fingerprint minutiae. Cornell Aeronautical Laboratory. Technical Report CAL No. XM2478-X-1:13-17. SHERLOCK, B.G., MONRO, D.M. DAN MILLARD, K. (1994). Fingerprint Enhancement by Directional Fourier Filtering. IEEE Proc Visual Image Signal Processing.141.2. 87-94. WAHAB, A, CHIN, S.H AND TAN, E.C. (1998). Novel Approach to Automated Fingerprint Recognition. IEE Proc-Vis.vol 145, no3.
Index
195
8
FINGERPRINT SINGULARITY AND CORE POINT DETECTION
Fadzilah Ahmad Dzulkifli Mohamad
INTRODUCTION
Fingerprint has been widely applied as a personal identification. Due to reliability and uniqueness features. In fingerprint, there are two kinds of features: the global feature and local feature. The global feature includes the ridge orientation map, core and delta locations, while the local feature form by minutiae points. Singular points are the most important global features that contain the significant global information which play an important role in fingerprint pattern classification (Hong and Jain, 1999) and fingerprint matching (Sharath et al., 2000). One of the features of fingerprint identification and verification is singularity (see Figure 1.0). The accuracy of singularity extraction basically depends on the quality of images. Therefore, in order to improve the identification and verification process, we need to enhance the fingerprint image. The poor quality of fingerprint image makes efficient singularity extraction algorithm degrades rapidly and we cannot identify the singular points area efficiently. A majority of techniques that used to
196
enhance the fingerprint images are based on the use of contextual filters whose parameters depend on the local ridge frequency and orientation. The filters themselves may be spatial (Gorman and Nickerson, 1989; Jain et al. 1998) or based on Fourier domain analysis (Sherlock et al. 1994; Watson et al. 1994). Singular points are the discontinuities in the orientation field of fingerprint. Core and delta points are examples of two different types of singular points as can be seen from Figure 1. The core point is defined as the top most point on the innermost upward recurving ridge (Srinivasan and Murthy, 1992) and delta point is defined as the point of bifurcation(in a delta-like region) on a ridge splitting in two branches which extend to encompass the complete pattern area (Srinivasan and Murthy, 1992).
core
delta
Figure 1
The fingerprint image with core and delta points
Core and delta points both have distinctive features. However, core point is mostly used as reference points for fingerprint matching and classification (Karu and Jain, 1996) rather than the delta point. Therefore, in order to align the
Fingerprint Singularity and Core Point Detection
197
fingerprint images which used in minutiae algorithm, core point position can be used as a reference point. Most fingerprint identification system used reference point to extract fingerprint features. Consequently, accurate and consistent reference point detection considerably affects the overall fingerprint identification system.
PREVIOUS WORK
It is a common trend in fingerprint reference point detection algorithm to used Poincare index (PI) as an approach to detect the core point. This method simply computed the PI of each block by summing up the direction changes around a closed digital curve of the block. Previous researcher as a (Kawagoe and Tojo, 1984; Karu and Jain, 1996; Zhang et al. 2001; Hong and Jain, 1999) proposed Poincare index method as an elegant and practical method to detect the core point location. Instead of using PI as a main technique to locate singular points, it also can judge the type of singular points. Generally, most of the approaches proposed by the researchers for the singularity detection operate on the ridge orientation image. In order to locate the position of singular points accurately, we should compute the reliable orientation field of fingerprint images and make full use of this property. However, to improve the reliability of orientation field image is still a challenging task because of the noise or poor quality of fingerprint image. Poincare index method may lead with this problem as it is ease to be affected by the noise of orientation field image and may detect the false singularities (see Figure 2). In view of the matter, an interesting implementation of Poincare index method for locating singular points was proposed by Bazen and Gerez (2002).
198
Figure 2
a) A poor quality fingerprint; b) the singularities of the fingerprint in a) are extracted through the Poincare method (circles highlight the false singularities); c) the orientation image has been regularized and the Poincare method no longer provides false alarms (Maltoni et al. 2003)
Multi-resolution approach is one of another technique to determine singularities as initiated by Koo and Kot (2001). On the other hand, early exploration of multi-resolution approach to locate the singularities of fingerprint images were proposed by Jain et al., (2000). This method based on the orientation image of a maximum ridge curvature (Xudong Jiang et al. 2004). The presence of noise in fingerprint image will affect the process to compute the curvature which is sensitive to noise of orientation field image; therefore this method does not work well for poor quality fingerprint images. As stated in Xiao Xu (2006), multi-resolution approach can not find the accurate core point location and still hard to decide the size of the reference points region.
199
Since singular points is defined as the points those discontinuities in the orientation field, many researchers extracted the singular points location by using the orientation field images. Another approach that used orientation field to detect the singular points was proposed by (Nilsson and Bigun, 2002). The authors implemented complex filtering technique with multi-resolution scales. This method is time consuming, but it is very hard to give good localization of singular points location. Chui-Hyun Park et al. (2003) introduced an efficient reference point detection algorithm by using the orientation pattern labeling. This method is not efficient to fingerprint rotation and its result for plain arch fingerprint is inferior to that of the method that proposed by Jain et al (2000). Manhua Liu et al. (2005) used the multiscale-analysis approach to locate singular points efficiently and consistently for all types of fingerprint images. This approach relies on the reliability of orientation field of fingerprint images. Another reference point localization technique proposed by (Xudong Jiang et al. 2004) based on hierarchical analysis of orientation coherence. However, method needs the consistency of the local orientations to get accurate location of singular points. By improving these features, the authors proposed an orientation smoothing method to attenuate the noise of orientation field and compute the reliable orientation field. In fact, the enhancement process of the fingerprint images is required in order to improve the clarity the ridge structures. As stated in (Mohammed S. Khalil et al. 2007), Gabor filter is applied for enhancing the image using the estimated ridge frequency. The authors proposed a novel method for fingerprint reference localization by calculating the reliability of the enhanced image. Manhua Liu et al (2005) calculate the reliability to measure the accuracy of the orientation. Another authors, Xinjian et al. (2005) also used the reliability as threshold value for the image. From our study, we can conclude that there are four main approached to allocate the singular points (Jain et al. 1999) namely mathematical model representation, statistical approaches, method based on the Fourier transform and method based on fingerprint
200
structures. According to (Sherlock and Monro, 1993; Vizcaya and Gerhardt, 1996), the representation of an accurate model for fingerprint images is a difficult task due to presence of noise in fingerprint images. Method that based on the statistical approaches can not adapt themselves to different image characteristics although it used the histogram to attenuate noise effect. Fourier transform method is not efficient enough to allocate the singular points because of working in the frequency domain. But, some researchers have claimed to obtain good results. In general, method that applied on fingerprint structures have been tested on large databases and was successfully (Karu and Jain, 1996; Ratha et al. 1996; Choi, B.H. et al. 2000). Figure 3 shows the previous work timeline of singular points detection technique.
201
Figure 3
The previous work timeline of singular points detection technique
202
CORE POINT DETECTION TECHNIQUE
A major common step in core point detection is used the orientation estimation of fingerprint images (Sen Wang et al. 2002; Maio and Maltoni, 2003). Simply refers the orientation as the direction of the ridges in the images, which we can use different techniques namely Poincare Index method, Detection of Curvature, and Geometry of Region Technique. Following are the elaboration for three techniques that implemented by researchers to detect the core point location.
Poincare Index (Pi)
Following are the overall steps for the PC technique as described by (Hong et al. 1999; Seng and Yangsheng, 2004): 1. Let (i, j ) be the orientation field and estimate the orientation by using the least square estimation algorithm as mentioned in (Sharath et al. 2000; Sen Wang et al. 2002). 2. Initialize a label image A which is used to indicate the core point. For each pixel in (i, j ) compute Poincare index, PI (x, y) as defined in the (Seng Wang et al., 2002; Seng Wang and Yangsheng, 2004)
Poincare( x, y ) = where
1 2
(k )
k =0
N 1
(1)
203
(k ) if (k ) < 2 (k ) = + (k ) if (k ) 2 (k ) otherwise and (k ) = ( x( k +1) mod N , y ( k +1) mod N ) ( x k , y k )
(2)
(3)
3. As mentioned in Kawagoe and Tojo (1984) the core point object should yield the Poincare Index between 0.40-0.51. If it comes then we label the corresponding A (i, j) with 1 else we label it with 2. If the Poincare index is -0.5 then such a block is the delta block. 4. Then, we calculated the center of that object. The largest regions and the center of the block with the value of one are considered to be a core point. However, it is enabling to make average calculation if there is more than one block with those values. 5. The center location will gives us the core point location as shown in Figure 4. But, first found core point may be slightly errored. To overcome this problem, core point tuning is being performed on that false regions area as described in (Nabeel Younus et al., 2007).
204
a
Figure 4
Core point detection; a) Detected core point; b) Tuned core point
Detection of Curvature (Dc)
According to (Sen Wang et al. 2002; Seng and Yangsheng, 2004), DC technique can be summarizes as follows: 1. By using the equation in Sharath et al. (2000), compute the local orientation (i, j ) . The input block size will be as small as defined w = 3, such an example take k x l = 3 x 3 pixels. 2. Smoothing process of orientation field is required in order to get the better result by using equation in Sharath et al. (2000).
205
3. For every block that has been defined before, compute the difference of direction components via the following equations:
3 3 sin 2 ( k , 3 ) sin 2 ( k , 1 ) DiffY = k =1 k =1
3 3 cos 2 ( 3 ,l ) cos 2 ( l , 1 ) DiffX = l =1 l =1
(4)
(5)
4. The core point location will be located at the corresponding pixel (i, j) where Diff X and Diff Y are both negative.
Geometry Of Region Technique (Gr)
The geometry region of fingerprint images play an important role to detect the core point since the line curvature varies sharply near the core region (Maio and Maltoni, 2003). The GR technique can be explained as follows. 1. Compute the smoothed orientation field (i, j ) by using the equation below (Sharath, 2000):
(i, j ) = tan 1 (
1 2
y (i, j ) x (i, j )
(6)
2. By using the equation below, (Sharath, 2000), we need to compute (i, j ) which is the sine component of (i, j ) .
(i, j ) = sin( (i, j ))
(7)
206
3. Initialize a label image A which is used to indicate the core point. 4. To compute equation below, we need to assign the corresponding pixel in the value of the difference in integrated pixel intensity of each region of an image. A(i, j ) = (i, j ) (i, j )
R1 R2
(7)
To get the maximum curvature in concave ridges of fingerprint images and at least get one ridge, the region R1 and R2 are determined empirically. 5. Find pixel (i, j) which have the maximum value in A and we assign it as the core point. 6. To overcome the problem if the core point still can not be locate successfully, the construction procedure for steps (15) will be iterated for a number of times while it decreasing the window size that used in step1.
CONCLUSIONS
Determination of core point localization may be highly accurate by using fine orientation estimation. Extracting fine and reliable orientation estimation is the challenging task especially for the poor quality fingerprint images. Solving this problem as well, an enhancement process need to be done first to increase the accuracy of orientation estimation and get the better results for detecting core point locations. In this chapter, we presented various method for detecting the core point location. Our approach used simple Poincare index
207
method with some modifications to provide better results. We preferred to use the Poincare index since it is much faster than DC technique as compared by Nabeel Younus et al. (2007). DC approaches working in the small window, therefore the detection of core point location is quite slowly. Through several experiments, the GR technique showed fairly good results in detecting the core point. However in some cases, it can not allocate the singular points as accurate as well. Currently, the modifications of several methods in core point detection are required to improve the accuracy of fingerprint verification system.
REFERENCE
BAZEN, A.M., GEREZ, S.H., 2002. Systematic Methods for the Computation of the Directional Fields and Singular Points of Fingerprints. IEEE Transcations on Pattern Anakysis and Machine Intelligence, vol. 24, no.7. pp. 905-919. CHO B.H., KIM J.S., BAE J.H., BAE I.G., AND YOO K.Y., 2000. Core-Based Fingerprint Image Classification, Proceedings of the 15th IEEE International Conference on Pattern Recognition, pp.863-866. HONG, L., JAIN, A.K., 1999. Classification of Fingerprint Images. Proceedings of 11th Scandividian Conference on Image Analyss. HONG, L., YIFEI, W., JAIN, A.K., 1998. Fingerprint Image Enhancement: Algorithm and Performance Evauation. IEEE Transcation on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 777-789. JAIN A.K., SALIL PRABHAKAR, LIN HONG AND SHARATH PANKATI, April 1999. A Multichannel Approach to Fingerprint Classification. IEEE Transactions on PAMI, vol. 21, no 4, pp. 348 3 59.
208
JAIN A.K., SALIL PRABHAKAR, LIN HONG AND SHARATH PANKATI, May 2000. Filterbank-Based Fingerprint Mathcing, IEEE Transactions on Image Processing, vol. 9, no.5, pp. 846 859. KARU K. AND JAIN A.K., 1996. Fingerprint Classification, Pattern Recognition, vol. 29, no.3, pp. 389-404. KAWAGOE AND TOJO, 1984. Fingerprint Pattern Classification, Pattern Recognition, vol.17, no.3, pp. 295-303. KOO W.M. AND KOT A., 2001. Curvature Based Singular Points Detection, 3rd International Conference on Audio and Video Based Biometric Person Authentication. Lecture Notes in Computer Since (LNCS) 2091, pp.229-234. MALTONI D., MAIO D., JAIN A. K., AND PRABHAKAR S., June 2003. Handbook of Fingerprint Recognition. Springer-Verlag. MANHUA LIU, XUDONG JIANG, AND ALEX CHICHUNG KOT., 2005. Nonlinear Fingerprint Orientation Smoothing by Median Filter. ICICS 2005, Pages: 1439-1443. MOHAMMED S. KHALIL, DR. DZULKIFLI MUHAMMAD AND M. MASROOR AHMED, 2007. Reliable Method to Locate Singular Point, Universiti Teknologi Malaysia. NABEEL YOUNUS KHAN, DR.M.YOUNUS JAVED,NAVEED KHATTAK,UMER MUNIR, 2007. IEEE Transcation on Digital Imaging Computing Techniques and Applications, pp. 260-266. NILSSON K. AND BIGUN J., 2002.Complex Filters Applied to Fingerprint Images Detecting Prominet Symmetry Points Used for Alignment, Proc. Workshop on Biometric Authentication (in ECCV 2002), LNCS 2359, pp.39-47, Springer-verlag, New York. OGORMAN L. AND NICKERSON J.V., 1989. An Approach to Fingerprint Filter Design, Pattern Recognition, vol. 22, no. 1, pp. 29-38. PARK C.H., OH S.K., KWAK D.M., KIM B.S., SONG Y.C., AND PARK K.H., 2003. A New Reference Point Detection Algorithm Based on Orientation Pattern Labeling in Fingerprint Images, Proc. of 1st Iberian Conf on Pattern Recognition and Image Analysis, Puerto de Andtratx, Spain.
209
RATHA N.K., KARU K., CHEN S., AND JAIN A.K., 1996. A RealTime Matching System for Large Fingerprint Databases, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.18, no.8, pp.799-813. SEN WANG AND YANGSHENG WANG., January 2004. Fingerprint Enhancement in the Singular Point Area, IEEE Signal Processing Letters, vol. 11, no. 1. SEN WANG, WEI WEI ZHANG, YANG SHENG WANG, 2002. Fingerprint Classification by Directional Fields, Proceedings of the Fourth IEEE InternationalConference on Multimodal Interfaces (ICMI02). SHERLOCK B.G. AND MONRO D.M.,1993. A Model for Interpreting Fingerprint Topolgy, Pattern Recognition. vol.26, no.7, pp. 1047-1055. SHERLOCK B.G., MONRO D.M., AND MILLARD K., 1994. Fingerprint Enhancement by Directional Fourier Filtering, IEEE Proceedings Vision Image and Signal Processing. vol.141, no.2, pp. 87-9 4. SRINIVASAN V.S. AND MURTHY N.N., 1992. Detection of Singular Points in Fingerprint Images, Pattern Recognition, vol. 25, no. 2, pp. 139-153. VIZCAYA P.R. AND GERHARDT L.A., 1996. A Nonlinear Orientation Model for Global Description of Fingerprints, Pattern Recognition, vol.29, no.7, pp. 1221-1231. WATSON C.I., CANDELA G.I., AND GROTHER P.J., 1994. Comparison of FFT Fingerprint Filtering Methods for Neural Network Classification, Tech. Report: NIST TR 5493. XIAO XU AND KAMATA S., 2006. Fast and Accurate Singular Point Extraction of Fingerprint, The 8th International Conference on Signal Processing. XINJIAN CHEN, XUDONG JIANG, JIE TIAN, YANGYANG ZHANG AND XIN YANG, 2005. A Robust Orientation Estimation Algorithm for Low Quality Fingerprints. Advances in Biometric Person Authentication. 2005, vol. 3781/2005, pp. 95-102.
210
XINJIAN CHEN, JIE TIAN, YANGYANG ZHANG AND XIN YANG, 2005. Enhancement of Low Quality Fingerprints Based on Anisotropic Filtering. Advances in Biometrics. vol. 3832/2005, pp. 302-308. XUDONG JIANG, MANHUA LIU, AND ALEX CHICHUNG KOT., 2004. Reference Point Detection for Fingerprint Recognition. , Proceedings of the 17th IEEE International Conference on Pattern Recognition (ICPR04). ZHANG Q, HUANG K. AND YAN, H., 2001. Fingerprint Classification Based on Extraction and Analysis of Singularities and Pseudoridges, Pan-Sydney Area Workshop Visual Information Processing (VIP2001). vol. 11, Sydney, Australia.
Index
211
9
REVIEW OF LENGTH ESTIMATORS IN 2D
Oldooz Dianat Habibollah Haron
INTRODUCTION
The practice of representing an image is usually in digitized form. This digitized format cause to apply discrete geometric to extract quantitative information such as feature size or length (V. Toh, C.A. Glasbey et al. 2003). Although because of digitization there is not any knowledge about the continuous shape, by assistance of discrete geometric estimators it is possible to obtain the feature size and often this involves curve length estimation. In discrete geometry features are categorized in to two groups, one is global such as area, perimeter and moments and the other is local such as tangents, normals and curvatures(Francois de Vieilleville, Jacques-Olivier Lachaud et al. 2005). The estimating these geometric features are based on their shape or curves. Shape is a major feature in visual databases such as image, graphics and video is shape. It is carries meaningful information about the associated visual objects. There are a variety of schemes for shape representation. One of the widely used schemes is chain code. It is used for coding purpose as well as for syntactical analysis of a digital line or curve(Yi Xiao, Ju jia Zou et al. 2001).
212
Most length estimators are based on either pixel counts or properties of chain code representation of a digital line or curve. This process is contained of, first object segmentation, second extract pixel based measurements, then converts these to an estimate of actual size(V. Toh, C.A. Glasbey et al. 2003). We restrict our study to this process. It is divided into discretization or digitization of curve, chain code for representing of this digitization at last length estimators for applying this chain code for estimating the length of curve.
DISCRETIZATION
An abundant literature is devoted to the study of discretization schemes. Let E be a Euclidean space, and let D be a discrete space related to E. Typically, one can take E = Rn and D = Zn (n = 2, 3). A discretization scheme associates to each subset X of E, a subset D(X) of D which is called the discretization of X. Different discretization schemes have been proposed and compared with respect to some fundamental geometrical, topological and structural properties. A well defined discretization should has to preserve symmetry and the connectivity of continuous shape (Michel Couprie, Gilles Bertrand et al. 2003). There are different kinds of discretization. One of the popular discretization schemes for lines, called grid intersection digitization figure 2, does guarantee that the discretization of a straight line is a digital curve, in the sense of the digital topology. A proof of this property can be found in (R. Klette 1985). The drawback of this discretization scheme is its lack of symmetry: for any intersection of with a pixel boundary, the pixel vertex which is closest to this intersection is chosen as an element of the discretization of , and if the intersection is at equal distance between two vertices, then an arbitrary choice is made (for example, the rightmost or upmost vertex). This drawback is shared by other discrete models for straight lines and planes, the Bresenhams model (Bresenham 1965), the naive model and the
Review of Length Estimators in 2D
213
standard model . Above this lack, this discretization is still famous because of simplicity and liner algorithm.
Figure 1
Grid intersection digitization (Reinhard Klette and Rosenfeld 2004)
Another kind of discretization is based on the definition of the supercover, the supercover does not suffer from this lack of symmetry. But the supercover has also a drawback for thin objects such as straight lines. If a straight line in R2 goes through a point with integer coordinates, then the supercover of contains the four pixels that cover this point - this configuration is called a bubble(Michel Couprie, Gilles Bertrand et al. 2003). The natural way of digitizing a region in the Euclidean plane consists in subdividing the plane in regularly spaced elementary areas corresponding to the pixels and in measuring for
214
each pixel its portion covered by the given Euclidean region. The measured value must be compared with a threshold. A pixel must be set to 1 if the measured portion is greater than the threshold and set to 0 otherwise. This method is called digitizing by thresholding(Kovalevsky 1997).
CHAIN CODE
Coded contours provide a compact region representation suitable for detecting features such as corners, area perimeter, moments, centers, eccentricity, projection and straight line segments which can be used for shape analysis and shape-based pattern recognition(G.Wagenknecht 2007). There are many techniques available to describe boundary curves, such as B-splines , Fourier descriptors, chain codes, polygonal approximation, curvature primal sketch, medial axis transform, autoregressive models, among others(Femand S. Cohen and Wang 1994). The chain code is one of the traditional image data structures. Chains are used for the description of object, borders in image processing(J. KORMOS and VEREB 2003). It can be viewed as a connected sequence of straight line segments with specified lengths and directions. Basically, the chain coding algorithm is a sequential approach of contour tracing. Contour tracing methods were developed for segmenting and representing regions by their closed contours for image compression, shape-ofobject representation, object recognition and contour-based region analysis(G.Wagenknecht 2007). It traces the border pixels one-byone and generates codes by considering neighborhood allocation(Frank Y. Shih and Wong 2001).
215
The first boundary code was developed by Freeman for pixel-based contours. The path from the centers of connected boundary pixels are represented by an eight-element code, called the chain or Freeman code. This representation is based on the connectivity definition of neighboring pixels. For Freeman chain code, Pre-requisites are conditions for the start pixel, the neighborhood of contour pixels, the direction in which the contour must be traced, and the stopping criterion(G.Wagenknecht 2007). Figure 2 shows the numbering scheme. The absolute coordinates of the start pixel (e.g., top, leftmost) and the chain code represents the region contour completely. A change between two consecutive chain codes means changing the direction of the contour(G.Wagenknecht 2007).This chain code is translation invariant. But it is not rotation invariant, not scale invariant; it is noise sensitive, and depends on the starting point seriously(J. KORMOS and VEREB 2003). As it obvious from Figure 2, there are two kinds of Freeman chain code, 4-connected and 8-connected. A 4- connected chain code is made of just even chain-code elements (horizontal and vertical links) but an 8-connected chain-code string consists of even chain-code elements (horizontal and vertical links) and odd elements (diagonal links). Different chain-codes can have the same no. of even and odd, denoted as ne and no respectively, and hence the same value of a length estimator(V. Toh, C.A. Glasbey et al. 2003).
216
Figure 2
Freeman chain code: direction of the neighbors (a) 4-connected; (b) 8-connected
Another kind of chain code proposed by Bribiesca(Bribiesca 1999), and for 2D called, the Vertex Chain Code(VCC for short). Figure 3 presents the VCC. It has the following specialties: Independence of starting point. This chain is closed, therefore the VCC may be starting point normalized by selecting the minimum integer magnitude for the resulting sequence of elements (the elements make the integer). Independence of rotation. It results from this fact that VCC is based on the number of cell vertices which are in touch with the bounding contour. Independence of mirroring transformation. The proposed chain code for shape composed of rectangular, triangular and hexagonal. The advantage of VCC over the Freeman chain code: The chain code is stable in shifting, turning, mirroring movement of image, and optionally has normalized starting point.
217
It is suitable for representation shapes based all regular cells (rectangular, triangular and hexagonal). The chain elements are real value and part of the shape because they are the number of vertices of bounding nodes. VCC is helpful for presenting the relation between the bounding contour and interior of the shape.
Figure 3 The Vertex Chain Code for rectangular grid(Bribiesca 1999)
LENGTH ESTIMATORS
The variety of length estimators are proposed up to now. Two most widely used are local and global length estimators. Local approaches are based on the local metrics such as chamfer metrics, while global approach are based on polygonalization of digital
218
curve ,e.g. directed on the subsequent calculation of maximumlength digital straight segments(DSSs) or of minimum length polygons(MLPs)(Coeurjolly. D and R 2002). Local metrics or Pixel-based Length Estimators These estimators were historically the first attempts towards a solution of the length estimation problem. A simple estimate of length just counts number of pixels. In general this is not unbiased for length of the underlying smooth curve, and pixel count differs for 8-connected and 4-connected lines. Weights have been designed with the intention of approximating the Euclidian distance. For example, horizontal and vertical moves in the orthogonal r-grid(r is grid resolution which is the inverse of grid constant) may be weighted by 1/r, and diagonal moves may be . In order to make length estimation as accurate weighted by as possible, the use of statistical analysis has been suggested to find those weight which minimize the mean square error between estimated and true length of straight segment. The best linear unbiased estimators for straight lines defines as follows(L. Dorst and Smeulders 1987):
= .(0.948.ni + 1.343.nd )
1 r
(1)
Where ni is the number of isothetic steps and nd of diagonal steps in the r-grid. An edge is the Euclidean segment joining two consecutive vertices, and a digital edge is the discrete segment joining two consecutive vertices. It is clear that we have as many edges as digital edges and as vertices. The shortest pattern of a supporting edge for which its maximal segment may contain 2n + 1 digital edge is zn = [0, 2, . . . , 2]. If the convex digital polygon is enclosed
219
in a m m grid, then the maximal number n of digital edges included in one maximal segment is upper bounded as: 4m n log / log(1 + 2) 1 (Francois de Vieilleville, Jacques2 Olivier Lachaud et al. 2005). By this theorem the upper bound of local number of edges in the local length estimator is defined. Polygonal approaches
Two kind of polygonal approaches are concerned in this study the first one is Maximum digital straight segments and the other is Minimum length polygon.
The first method is based on the definition of digital straight segments (DSS for short). The DSS is a sequance of grid , . The point is the grid point closes to points the end point of real segment. If we already have point , the next point has x-cooridinates xi+1, and for its y-coordinates, we must desciede between yi or yi+1. To describe a digital straight line on a lattice , freemans chain code or the VCC can be used(Ivo Povazan and Uher 1998). Rosenfeld (Reinhard Klette and Rosenfeld 2004) defined straight line segment introducing the so called chord property. A digital arc is a digitaization of a straight line segment if and only if it has the chord property.A set M of grid points satisfied the chord property iff, for any two distinct p and q in M and any point r on
220
the (real) line segment pq, there exists a grid point t M such that (r,t) max (|xr-xt|,|yr-yt|)<1. DSS recognition algorithms decide whether a given sequence of grid points or chain code is a DSS; some of them also segment a digital arc or curve into a sequence of maximum-length DSS. A length estimator is then defined by the resulting polygon or polygonal arc. Here we study the algorithm DR1995 (DebledRennesson 1995). Algorithm DR1995 is an efficient linear online DSS recognition algorithm; it has been also applied for DSS recognition in 3D.It is based on the calculation of a narrowest strip defined by the nearest support below and above .The mathematical background for this approach is arithmetic geometry . This algorithm uses the definition of DSS (or digital planes) in arithmetic geometry by pairs of linear Diophantine (i.e., only integer parameters) inequalities. In the 2D case, let a and b be relatively prime integers, let c and w be integers:
D a, b,c, w = {(i, j) Z 2 : c bi - aj < c + w} (2)
Da,b,c,w is called an arithmetic line with slope b/a, approximate intercept c/a, and arithmetic width w . a
221
Figure 4
Arithmetic line (Reinhard Klette and Rosenfeld 2004)
The two supporting straight lines are b c j= i a a and b c w j= i - a a a It is d =
(3)
(4)
w .The actual (i.e., geometric) intercept could be a 3c + w specified by . 2a
Based on this theorem (Reinhard Klette and Rosenfeld 2004) which said: Any set of grid points Da,b,c,max{|a|,|b|} is the set of grid points of a digital straight line. Conversely, for any rational digital straight line, there exist a, b, and c such that the set of grid points of the given digital straight line is Da,b,c,max{|a|,|b|}. It is
222
possibla to state that w = max{|a|, |b|} defines digital straight lines, and as subsets also digital 8-rays or 8-DSSs. For digital 4-rays or 4-DSSs the equation is w = |a| + |b| .
Figure 5
Digital straight line(Reinhard Klette and Rosenfeld 2004)
Assumption for this algorithm is considering lines with slope 0 b/a 1; thus we have 0 b a. This allows in the following that a is used instead of max {|a|, |b|}. Based on the assumption the result is specified for the first octant in the way that: For a naive line, we have w = a, and all the grid points in Da,b,c,w lie between or on two lines
bx ay = c
and
(5)
bx ay = c + a 1
(6)
223
These are the two supporting lines of Da,b,c,a . For testing the algorithm any grid points (x, y) on an 8-DSS must satisfy the constraint D: c bx ay < c + a where a and b are relatively prime integers and c is an integer. The algorithm has to provide values of a, b, and c. The initial grid point q1 of a new 8-DSS is identified with the origin (0, 0). It is assume that the given sequence of grid points involves moves in at most two directions: (1, 0) or (1, 1); thus the process is in the x-direction in the first octant. Other point sequences are mapped into this by reflection. At q1 = (0, 0), we start with condition
D1 : 0 y < 1
(7)
We have a = 1, b = 0, and c = 0.
After intializing, the other step is to obtain the next grid point which must be test and the valuses must be updated based on the algorithm. For a new point qn+1 A8 (qn ) , where xn+1 > xn, we have three cases: qn+1 is between or on these two lines (i.e., no update is needed) qn+1 is (just) above the upper qn+1 is (just) below the lower supporting line. Let u1, u2 and l1, l2 be the points on the upper and lower supporting line, respectively, figure 6,where index 1 denotes the point qi (1 i n) with the smallest x-coordinate and index 2 denotes the point with the largest x-coordinate.
224
Figure 6
Test and update the next grid point
Up to now the algorithm define the value of q1,,qn for initial 8DSS by calculating the slope b/a. The remainder of point qn+1 with respect to slope b/a is
bxn +1 ayn +1
Let
(8)
r = bxn +1 ayn +1 c
(9)
Equation 9 is a base for make a decision about the condition of the new point in 8-DSS.
225
Case 0; If 0 r < max{|a|, |b|}, then qn+1 Da ,b ,max{|a|,|b|} . Case 1; If r = 1, then {q1, . . . , qn, qn+1} is an 8-DSS with a slope that is defined by vector u1qn+1. Case 2; If r = max {|a|, |b|}, then {q1, . . . , qn, qn+1} is an 8-DSS with a slope that is defined by vector l1qn+1. Case 3; If r < 1 or r > max{|a|, |b|}, then {q1, . . . , qn, qn+1} is not an 8-DSS. The logic and the critical point in the algorithm are described, table 1 one show the pseudo code pf the algorithm.
226
Table 1
Algorithm DR 1995 (for the first octant;max{|a|,|b|}=a)
Step 1
1. Let r = bxn+1 ayn+1 c be the remainder of the new point qn+1 minus c. 2. If 0 r < a, then u2 = qn+1 (if r = 0) or l2 = qn+1 (if r = a 1), and stop; otherwise, go to Step 3. 3. If r = 1, then l1 = l2, u2 = qn+1, a = |xn+1 u11|, b = |yn+1 u12| (where u1 = (u11, u12)), c = bxn+1 ayn+1 and stop; otherwise, go to Step 4. 4. If r = a, then u1 = u2, l2 = qn+1, a = |xn+1 l11|, b = |yn+1 l12| (where l1 = (l11, l12)), c = bxn+1 ayn+1 a + 1 (or c = bu21 au22) and stop; otherwise, go to Step 5. 5. The new point does not form a DSS with the previous n points; initialize a new DSS at qn.
Step 2
Step 3
Step 4
Step 5
The other famous length estimator is base on the definition of length polygon which obtains by the inner and outer frontier digitization. Following the given 1-curve in counter-clockwise orientation (in a right hand coordinate system), the sequence of 1cells on the right form a 0-curve which called 1 of 1-cells (the outer frontier in case of a simple 1-curve), and those on the left form a 0-curve which called 2 of 1-cells (the inner frontier). Vertex vi on 1 or 2 is called a convex vertex if the frontier makes
227
a positive turn at vi, detectable by D(vi-1, vi, vi+1) > 0, where D is the determinant by equation 10:
x1 y1 1 x y 1 = x y + x y + x y - x y - x y - x y 1 2 3 1 2 3 3 2 2 1 1 3 2 2 x3 y3 1
(10)
vi is called a concave vertex if the frontier makes a negative turn (D(vi-1, vi, vi+1) < 0); a collinear vertex if D(vi-1, vi, vi+1) = 0. The algorithm traces 1 (or 2), detects convex and concave vertices, puts their coordinates into a list L, and marks them as convex or concave.
For simplicity assume that the coordinates are integers; the coordinates of two successive vertices with indices i and i + 1 satisfy
x i +1 - x i + y i +1 - y i = 1
(11)
Only convex vertices of the inner curve 2 and only concave vertices of the outer curve 1 can be vertices of the MLP(Reinhard Klette and Rosenfeld 2004). There exists a mapping from the set of all concave vertices of 2 onto the set of all concave vertices of 1 such that each concave vertex of 1 corresponds to at least one concave vertex of 2. Numbers in the Figure 7 denote successive vertices in the list L:
228
Vertex 1 is a start vertex (i.e., an already known MLP vertex); Vertices 3 and 5 are successive convex vertices on 2; Vertices 2, 4, and 6 are successive concave vertices on 1. Vertex 7 is not between the negative (black line) and positive (white line) sides of sector (6,1,5); therefore 5 is the next MLP vertex and is a new start vertex.
The algorithm starts after, putting all of the vertices of 2 into L, and then replace each concave vertex of 2 in L with its corresponding concave vertex of 1 by modifying its coordinates by 1, where the sign depends on the orientations of the incident edges. L is all (plus others) of the vertices that will form the MLP.
Figure 7
The process in MLP algorithm(Reinhard Klette and Rosenfeld 2004)
229
The pseudo code for MLP algorithm and calculation of its length L is as followed: 1. Initialize list L = (v1, . . . , vn) as described above; it contains all of the vertices on 2 except the concave vertices, which are replaced by concave vertices on 1. Each vi in L is labeled by the sign of D(vi-1, vi, vi+1) 2. Let k := 1, a := 1, b := 1 and i := 2. Let L := 0 and p1 := v1.//v1 is the first MLP vertex// 3. If i > n + 1, stop. 4. If i n, then j := i; else j := 1. //go back to v1// 5. If D(pk, vb, vj) > 0, then //vj lies on the positive side// {k := k + 1, pk := vb, i := b, a := b, and L :=L + de(pk1, pk)}; else (a) If D(pk, vb, vj) 0, then //vj is in the sector//; if vj has a positive label, then b := j, else a := j; else // vj lies on the negative side// (b) {k := k + 1, pk := va, i := a, b := a, and L :=L + de(pk1, pk)}; 6. Let i := i + 1 and go to Step 3. The algorithm has linear time complexity. Tangent-based Estimators A tangent vector integration process can be used to estimate the length of the curve. The equation 12 can be approximated by using discrete estimators of the products ( x, y ) dt :
230
b
l ( ) = ( x, y ) dt
a
(12)
An optimization method makes it possible to compute all of the discrete tangents in linear time by using the discrete tangent computed at each vertex to initialize the discrete tangent at the next vertex; generally this requires only a constant number of changes. Stereological Approach Nonparametric Approach
For this to last methods, definitions come as the comparison of them. Unlike the stereological approach, it is not concerned about unbiased estimation, but with asymptotic properties such as consistency and convergence rates. This estimator is intended to work asymptotically in any dimension d under quite general shape restrictions. It depends on a smoothing parameter which must be carefully chosen. This method will provide as a by-product an estimator of the boundary of the body G under study. In contrast, stereological methods are not usually concerned with the global estimation of sets; they are rather focused on the estimation of some real parameter (length, volume, surface area, . . . ). The sample data consists of randomly selected points. In stereology the available information for estimating lengths and surface areas usually comes either from one- or twodimensional sections or from systematic grids(Antonio Cuevas, Ricardo Fraiman et al. 2007).
231
CONCLUSIONS
For real images true feature size is not known and the most feasible means of comparing possible estimators is via a simulation study, which allows assessment of the error of estimation since true feature length is known (V. Toh, C.A. Glasbey et al. 2003). The true length of the underlying smooth line is calculated using Euclidean distance between the (real) end points, True curve length is computed using numerical approximation of arc length.(V. Toh, C.A. Glasbey et al. 2003) Length estimators are strongly depended on the method of discretization. The output of this process can be chain code. In this study we focus on the length estimators having digitized curve as input which means chain code. The review of two different kinds of chain codes was given, it is proved that the VCC has advantages over Freeman chain code. A useful property that a discrete geometric estimator may have is to converge toward the geometric quantity of the continuous shape boundary when the digitization grid gets finer(Francois de Vieilleville, Jacques-Olivier Lachaud et al. 2005). In (Coeurjolly. D and R 2002)is proofed that except local metrics the other kinds of length estimators satisfied the multigird convergence theorem. There exists extension for 3D length estimators for the DSS and MLP and local metrics approaches.
232
REFERENCE
ANTONIO CUEVAS, RICARDO FRAIMAN, ET AL. (2007). "A nonparametric approach to the estimation of lengths and surface areas." The Annals of Statistics 35(3): 1031-1051. BRESENHAM, J. E. (1965). "Algorithm for computer control of a digital plotter." IBM SYSTEMS JOURNAL 4: 25-30. BRIBIESCA, E. (1999). "A new chain code." 32 235251. COEURJOLLY. D AND K. R (2002). A comparative evaluation of length estimators. ICPR2002, Quebec, Canada D. COEURJOLLY, S. MIGUET, ET AL. (2004). "2D and 3D visibility in discrete geometry: an application to discrete geodesic paths." Pattern Recognition Letters 25: 561-570. FEMAND S. COHEN AND J.-Y. WANG (1994). "Part I: Modeling Image Curves Using Invariant 3-D Object Curve Models-A Path to 3-D Recognition and Shape Estimation from Image Contours." Ieee Transactions On Pattern Analysis And Machine Intelligence 16(1): 1-12. FESCHET, F. (2005). "Canonical representations of discrete curves." Pattern Anal Applic: 84-94. FRANCOIS DE VIEILLEVILLE, JACQUES-OLIVIER LACHAUD, ET AL. (2005). Maximal Digital Straight Segments and Convergence of Discrete Geometric Estimators. SCIA 2005. Heikki Klviinen, Jussi Parkkinen and A. Kaarna. Joensuu, Finland, Springer-Verlag Berlin Heidelberg 2005. LNCS 3540: 988997. FRANK Y. SHIH AND W.-T. WONG (2001). "An adaptive algorithm for conversion from quadtree to chain codes." Pattern Recognition 34: 631-639. G.WAGENKNECHT (2007). "Acontour tracing and coding algorithm for generating 2D contour codes from 3D classified objects." Pattern Recognition 40: 1294-1306. IVO POVAZAN AND L. UHER (1998). The Structure of Digital Straight Line Segments and Euclid's Algorithm, . SCCG'98.
233
J. KORMOS AND K. VEREB (2003). "Recognition of ChainCoded Patches with Statistical Methods." Mathematical and Computer Modelling 38: 903-907. KOVALEVSKY, V. (1997). "Applications of Digital Straight Segments to Economical Image Encoding." Lecture Notes In Computer Science,Proceedings of the 7th International Workshop on Discrete Geometry for Computer Imagery 1347: 51 - 62 L. DORST AND A. W. M. SMEULDERS (1987). "Length estimators for digitized contours." Graphics and image processing 34: 344-371. MICHEL COUPRIE, GILLES BERTRAND, ET AL. (2003). "Discretization in 2D and 3D Orders." Graphical Models 65(2003 Elseiver Science): 77-91. PHILIP J.SCHNEIDER AND D. H.EBERLY (2003). Geometric Tools for Computer Graphics, Elsevier Science. R. KLETTE (1985). "The m-dimensional grid point space." Computer vision, graphics, and image processing: 1-12. REINHARD KLETTE AND A. ROSENFELD (2004). Digital Geometry: Geometric Methods for Digital Picture Analysis, Morgan Kaufmann Publishers Inc. V. TOH, C.A. GLASBEY, ET AL. (2003). A Comparison of Digital Length Estimators for Image Features. SCIA 2003. Josef Bign and T. Gustavsson. Halmstad, Sweden, SpringerVerlag Berlin Heidelberg 2003. LNCS 2749: 961-968. V.A.KOVALEVSKY (1990). New Definition and Fast Recognition of Digital Straight Segments and Arcs. ICPR90, IEEE DOI Reference. YI XIAO, JU JIA ZOU, ET AL. (2001). "An adaptive split and merge method for binary image contour data compression." Pattern Recognition Letters 22: 299-307.
Index
235
10
THE CHARACTERISTICS AND MAPPING ALGORITHM OF RECTANGULAR VERTEX CHAIN CODE
Lily Ayu Wulandhari Habibollah Haron
INTRODUCTION
Image representation always becomes an important and interesting topic in image processing and pattern recognition. Since introduced by Freeman in 1961, that is known as Freeman Chain Code (FCC) the evolution and improvement of chain code representation scheme has been widely used as a topic of research. In 1999, Bribiesca introduced a new two-dimensional chain code scheme that is called Vertex Chain Code (VCC). An element of this chain indicates the number of cell vertices, which are in touch with the bounding contour of the shape in that element position. The VCC is composed of three regular cells, namely rectangular, triangular, and hexagonal. This chapter describes the characteristic and mapping algorithm one of the VCC cells, namely Rectangular VCC cell.
236
The characteristic is proposed to visualization thinned binary images into rectangular cells, especially thinned binary with holes. The characteristics cover starting point, rotation, and sum of chain elements. The mapping algorithm consists of cell-representation algorithm that represents a thinned binary image into rectangular cells, transcribing algorithm that transcribes the cells into Vertex Chain Code, and validation algorithm that visualizes vertex chain code into rectangular cell.
VERTEX CHAIN CODE
Bribiesca, 1999 introduced Vertex Chain Code (VCC) in 1999. And this chain code complied with three objectives that Freeman, 1974 proposed. Some important characteristic of the VCC are: (1) The VCC is invariant under translation and rotation and optionally may be invariant under starting point and mirroring transformation. (2) Using the VCC it is possible to represent shapes composed of triangular, rectangular, and hexagonal cells (Figure 1). (3) The chain elements represent real values not symbol such as other chain code, are part of the shape, indicate the number cell vertices of the contour nodes, may be operated for extracting interesting shape properties. (4) Using VCC it is possible to obtain relations between contour and interior of the shape.
The Characteristics and Mapping Algorithm of Rectangular Vertex Chain Code
237
Figure 1
Example of VCC Cells: (a) Triangular Cell (b) Rectangular Cell (c) Hexagonal Cell
Figure 2
Example of Rectangular Cells-VCC
In the Vertex Chain Code, the boundaries or contours of any discrete shape that are composed of regular cells can be represented by chains. Therefore, these chains represent closed boundaries. The minimum perimeter of closed boundary corresponds to the shape composed only of one cell. An element of a chain indicates the number of cell vertices, which are in touch with the bounding contour of the shape in that element position (E.
238
Bribiesca, 1999). Figure 2 shows Vertex Chain Code of Rectangular VCC cells which indicate the number of cell vertices, which are in touch with the bounding contour of the rectangle in that element position. An important property in image representation is about its connectivity, since it defines what we will regard as a connected region. For rectangular cells there are two kinds of connectivity; 4connectivity (Figure 3) and 8- connectivity (Figure 4). Here, rectangular cell with 4-connectivity is used.
m-1,n m,n-1 m,n m+1,n m,n+1
m-1,n-1 m-1,n m,n-1 m,n
m1, ,n+1 m,n+1
m+1, m+1,n m+1, n-1 n+1
Figure 3 4-connectivity
Figure 4 8-connectivity
CHARACTERISTICS OF VERTEX CHAIN CODE
In 1999 Bribiesca proposed vertex chain code for the shape without holes. This chapter will describe some characteristic for the shape with hole. For shapes with holes we will use term outer and inner. Outer term shows the external boundary, while inner shows internal boundary (Figure 5).
239
It is not different to compose vertex chain code between shapes without holes and shape with holes. Each code indicates the number of cell vertices; in touch with the bounding contour of the rectangle in that element position.Figure 6 shows vertex chain code from shape with holes.
outer
inner
Figure 5 Boundary for Shape with Holes
Figure 6 Vertex Chain Code for Shape with Holes
The characteristic for shape without had been presented by Bribiesca. While some characteristics of rectangular vertex chain code for shape with hole is presented in this section. Some characteristics are described below:
240
a.
Starting Point
For shape without holes, vertex chain code is invariant under starting point by rotating the digits [2]. This regulation also applies to the shapes with holes, both for outer and inner boundary. Figure 6 shows outer boundary has code: 121313131313131313131313131122222222222222222222222221 131313131313131313131313. While inner boundary has code: 323131313133222222222331313131. If the code we rotate the digits become Outer: 112222222222222222222222222113131313131313131313131312 131313131313131313131313 The visualization of the shape is not change. But for inner boundary the rotate of the digit must be appropriate with the rotation of the outer boundary digit. In case to visualize vertex chain code to rectangular cell, the problem is for inner boundary. For visualizing vertex chain code of inner boundary into rectangular cells it must satisfy: 1. Starting point of inner boundary is same to starting point the outer one 2. Length from inner cell to the closest outer cell 1
b.
Independence of Rotation
The vertex chain code is invariant under rotation, because VCC is based on the number of cell vertices which are in touch
241
with the bounding contour (E. Bribiesca, 1999). The outer boundary of shapes with holes also invariant under rotation, but for inner boundary the rotation must keep in track with the outer (Figure 7).
Figure 7
Rotation counterclockwise of VCC
c.
Sum of The Chain Code Element

Theorem 1. For any shape (without holes composed of pixels, the sum of the chain elements of the shape is equal to 2n-4, i.e.
a
n i
= 2n 4
(1)
242
While for shape with hole applies
a
n i
= 2n
(2)
ai : code ith n : number of code.

Figure 8 shows the example of this formula.
(a)
(a)
(b)
243
Figure 8 Shapes with Holes
n= n outer+n inner , Figure 8 (a) has n outer = 32, n inner = 16, so by using formulation 2 it is obtained sum of chain code elements equal to 96. Compare by adding the entire element of the code, it will equal to 96 too. Reciprocally for figure 8 (b), n outer = 78, n sum of chain code element equal to 216.
inner=
30, n = 108, and
THE MAPPING ALGORITHM VERTEX CHIN CODE
OF
RECTANGULAR
This section will described the mapping algorithm of rectangular vertex chain code. In this case the image that is used is the shapes without holes that proposed by Bribiesca. This section will divide into 6 part; the methodology, pre-processing, three part of mapping algorithm namely, cell-representation, transcribing, and validation algorithm, and the last the experimental results. a. The Methodology
The mapping algorithm of rectangular-VCC consisted of four processes that are pre-processing, cell-representation, transcribing, and validation process and is shown in Figure 9. Pre-processing process is the step of thinning a binary image into thinned binary image. This process is explained in Section b. Then the thinned binary image is represented into rectangular-VCC cells that are called cell-representation. The next process is to transcribe the rectangular-VCC cells into Vertex Chain Code, that is called transcribing process. And the last is validation process; in this
244
process the vertex chain code is visualized into rectangular cell to validate the cell-representation and transcribing algorithm. The result of the validation will show the similarity between visualization thinned binary image into rectangular cell and visualization vertex chain code into rectangular cell. These last three processes that are cell- representation, transcribing and validation algorithm are called the mapping algorithm. The mapping algorithm is presented in Section c, d and e.
Start
Binary Image
The Thinning Algorithm
Thinned Binary Image

The Cell-Representation Algorithm
The Transcribing Algorithm
Vertex Chain Code

The Validation Algorithm
Rectangular Cell
End
Figure 9
Flow of the Mapping Algorithm
The Mapping Algorithm
Rectangular Cell
Pre-Processing
245
b. The Pre-Processing
This algorithm use thinned binary image as input. Binary images are images whose pixels have only two possible intensity values. They are normally displayed as black and white. Numerically, the two values are often 0 for black and, either 1 or 255 for white. In the simplest case, an image may consist of a single object or several separated objects or relatively high intensity. This allows figure separation by thresholding. In order to create the twovalued binary image, a simple threshold may be applied so that all the pixels in the image plane are classified into object and background pixels. A binary image function can then be constructed such that pixels above the threshold are foreground (1) and below the threshold are background (0) (Figure 10). For several purposes a binary image need to be thinned. Thinned binary image is a binary image which the width is reduced to just a single pixel (Figure 11). Thinning process (W. Leung, C. M. Ng, and P. C. Yu, 2000) is an important pre-processing step in pattern analysis because it reduces memory of requirement for storing the essential structural information presented in pattern. For this purpose, the thinning algorithm that is created by Haron et. al, 2003 is applied. This thinning algorithm uses two-valued connectivity rules. The pixel of 1 will be replaced by pixel 0 when the number of pixel 1 of the neighbourhood eight direction of connectivity pixel is greater than 3.
246
0000000000000000000000000000000000000000000000000000 0000000000000000000000000001111000000000000000000000 0000000000000000000000000111111100000000000000000000 0000000000000000000000111111111100000000000000000000 0000000000000000000001111111111111000000000000000000 0000000000000000000111111111011111100000000000000000 0000000000000000001111111000001111110000000000000000 0000000000000000111111110000000111111100000000000000 0000000000000001111111000000000011111111000000000000 0000000000000011111110000000000000111111100000000000 0000000000000111111000000000000000001111111000000000 0000000000001111110000000000000000000011111111000000 0000000000111111100000000000000000000001111111100000 0000000001111111000000000000000000000000011111111000 0000000011111110000000000000000000000000000111111000 0000001111111000000000000000000000000000000011111110 0000011111110000000000000000000000000000000000111111 1111111111100000000000000000000000000000000000011111 1111111110000000000000000000000000000000000000000111 1111111100000000000000000000000000000000000000000111
Figure 9 Binary Image Image

Figure 10 Binary Image
0000000000000000000000000000000000000000000000000000 0000000000000000000000000001111000000000000000000000 0000000000000000000000000110000100000000000000000000 0000000000000000000000111000000100000000000000000000 0000000000000000000001000000000011000000000000000000 0000000000000000000110000000000000100000000000000000 0000000000000000001000000000000000010000000000000000 0000000000000000110000000000000000000100000000000000 0000000000000001000000000000000000000011000000000000 0000000000000010000000000000000000000000100000000000 0000000000000100000000000000000000000000011000000000 0000000000001000000000000000000000000010000110000000 0000000000110000000000000000000000000000000001000000 0000000001000000000000000000000000000000000000100000 0000000011000000000000000000000000000000000100001000 0000011000000000000000000000000000000000000000001100 0000010000000000000000000000000000000000000000000001 1111110000000000000000000000000000000000000000000001 1000000000000000000000000000000000000000000000000001 1000000000000000000000000000000000000000000000000001
Figure 10 Thinned Binary
Figure 11 Thinned Binary Image
In this algorithm, every element of thinned binary image is declared as array variables. Then all of the operations of the images are according to its rows and columns
c.
The Cell-representation Algorithm
The visualizing algorithm of rectangular-VCC is an algorithm that represents a thinned binary image into its rectangular cells. The algorithm has two-valued connectivity thinned binary images as input. Each code 1 in the thinned binary image represents each form of the rectangular cell. The direction of code 1 that adjacent to the other code 1 leads to the formation of next rectangle. Table 1 shows the representation of Rectangular-VCC that is formatted by the direction of code 1 that adjacent to the other code 1. When each code in binary image is visualized, a line drawing that consisted of rectangle cell will be created (Haron et. al, 2005, Subri et. al, 2006).
247
Table 1
Representation of Rectangular VCC
Table 2
Eight Direction Connectivity
(row+1, column-1) (row, column-1) (row-1, column-1)
(row+1, column) (row, column) (row-1, column)
(row+1, colum+1) (row, column+1) (row-1, column+1)
The algorithm considers eight directions of code 1 that adjacent to the others. Each code 1 is compared with other eight directions. Table 2 shows eight directions connectivity that is used here. Every code 1 fills one rectangle that is the length 1. In this algorithm, every horizontal line is drawn from the left to the right, and every vertical line is drawn from the bottom to the top. Based on these rules, the visualizing algorithm of rectangular VCC is created.
248
d.
The Transcribing Algorithm
The transcribing algorithm is an algorithm that transcribes the rectangular cells into vertex chain code. The algorithm uses eight directions connectivity. Rectangular Vertex Chain Code has three different codes, namely 1, 2, and 3. The code indicates the number of cell vertices, which are in touch with the bounding contour of the shape in that element position (E. Bribiesca, 1999). The algorithm focuses to the corner of each rectangle cells. Each corner of rectangle in the cells is named by A, B, C, and D (Figure 12). The algorithm covers every corner of rectangle by its own rules according to the eight direction connectivity.
Figure 12 Rectangle in Rectangular Cells
e.
The Validation Algorithm
The validation algorithm of rectangular-VCC is the algorithm that is used to validate the visualizing and transcribing algorithm. This algorithm visualizes the vertex chain code into rectangular cell again. It is developed by divide the direction into two ways namely clockwise and counter clockwise direction. This algorithm
249
visualizes the vertex chain code into rectangular cells. It is formed according to 24 shapes of rectangular cells. Every eight shape represents every code 1, 2, and 3. Every code is involved by previous code except the starting point code. This algorithm is invariant under starting point, thus it is not a matter which code that is chose as starting point. Table 3 shows the shape of rectangular VCC according to the direction.
Table 3 Shapes of Rectangular VCC According to the Direction
Clockwise Direction 1 2 3 1 2
Counter Clockwise Direction 1 2 3 1 2
3 3 1 2 1 2 3 1 1 2 3 1 2 3 2 3 1 2 3 3
250
Based on the Table 3, the validation algorithm of rectangular VCC is created, and also divided into two directions, clockwise and counter clockwise because the difference of the direction influence the next shape of the cells .
f.
Experimental Results
All of these algorithms are tested and validated by using three thinned binary images, namely L-Block, hexagonal, and pentagon. Table 4 shows the result of the experiment that uses the cellrepresentation, transcribing, and validation algorithm. Thinned binary image is represented into rectangular-VCC by using Cellrepresentation Algorithm, rectangular-VCC is transcribed into Vertex Chain Code by using Transcribing Algorithm and finally cell-representation and transcribing algorithm is validated by using Validation Algorithm, this algorithm visualizes the vertex chain code into rectangular cell again. This entire algorithm is called Mapping Algorithm
251
Table 4 Rectangular-VCC Cells and Vertex Chain Code of Three Thinned Binary Image (a) L-Block, (b) Hexagonal, (c) Pentagon
252
The interface of the mapping algorithm of rectangular VCC system is built by using Visual Basic 6 programming. Figure 12 shows the interface of the system in testing and validating the mapping algorithm of rectangular vertex chain code.
Figure 12
The Interface of Prototype System
Part 3 in Figure 12 is interface of the validation algorithm. The input is vertex chain code, and then the code is visualized into rectangular cells. The interface shows that the rectangular cell from vertex chain code visualizing is similar with the rectangular cells from the thinned binary image visualizing. The interface is shown in Figure 9 consists of three processes in the mapping algorithm. The input is a thinned binary image. A thinned binary image is then represented into rectangular-VCC cells. Then the process is continued to transcribe
253
the rectangular-VCC cells into Vertex Chain Code. The last process is to visualize the vertex chain code again into rectangular cells. The rectangular cell and code will be displayed automatically when the process finished.
CONCLUSIONS
The vertex chain code regulation for shape without hole and with hole is similar. It just distinguishes in some characteristic. For shape with holes inner boundary is depend on its outer boundary, especially fir rotation. The inner boundary cannot rotate without notice the outer rotation. The mapping algorithm has been tested and validated in cellrepresentation and transcribing thinned binary images into VCC by using three thinned binary image objects that are L-Block, hexagonal, and pentagon. The results show that the cellrepresentation algorithm capable to represent thinned binary image into rectangular-VCC cells. Reciprocally the transcribing algorithm also capable to transcribe the rectangular-VCC cells into Vertex Chain Code and validation algorithm result shows the rectangular cell that is similar with the rectangular cell from cellrepresentation algorithm. This entire algorithm is called The Mapping Algorithm of Rectangular Vertex Chain Code
254
REFERENCES
H. FREEMAN, On the encoding of arbitrary geometric configurations, IRE Trans EC-10 (2), 1961, 260-268 E. BRIBIESCA, A new chain code, Pattern Recognition, Vol. 32, Issue 2, 1999, pp. 235-251 H. FREEMAN, Computer processing of line-drawing images, ACM Computing Surveys 6, 1974, pp. 57-97 W. LEUNG, C. M. Ng, and P. C. Yu, Contour following parallel thinning for simple binary images, 2000 IEEE International Conference, Systems, Man, and Cybernatics, Vol. 3, 2000, pp. 1650-1655 H. HARON, D. Mohamed, and S. M. Shamsuddin, Extraction of junction, lines, and regions of irregular line drawing: The chain code processing algorithm, Jurnal Teknologi No. 38(D), Penerbit UTM, 2003, pp. 1-28 H. HARON, D. MOHAMAD, AND S. M. SHAMSUDDIN, Enhancement of thinning algorithm for 2D line drawing of 3D objects, Journal of Mathematics & Computer Sciences (Computer Science Series), India, December 2003, pp. 169-174 H. HARON, S. M. SHAMSUDDIN, AND D. MOHAMED, Corner detection of chain-coded representation of thinned binary image, International Journal of Computer Mathematics, U.K., Vol. 82 No 8, August 2005, pp. 941-950 S. H. SUBRI, Analisis rangkaian neural dalam pengesanan simpang bagi penterjemah lakaran pintar (Application of neural network in corner detection of intelligence sketch representer), MSc Thesis, Universiti Teknologi Malaysia, 2006 S. H. SUBRI, H. HARON, AND R. SALLEHUDDIN, Neural network corner detection of vertex chain code, ICGST International Journal on Artificial Intelligence and Machine Learning (AIML), Vol. (6) Issue(1), January 2006, pp. 37-43.
Index
255
11
GEOMETRIC REASONING ALGORITHM IN EXTRACTION AND RECOGNITION OF 3D OBJECT FEATURES
Zuraini Sukimin Habibollah Haron
INTRODUCTION
Geometric computation has been widely used since 1960s in CAD. CAD is the technology concerned with the use of computer systems to assist in the creation, modification, analysis and optimization of a design. The challenge is how to make the CAD system be able to design intelligently to perform geometric reasoning. Michael Brady in his book chapter Robotics science writes that reasoning is about problem involving geometry which is related to application and it is a challenge with AI reasoning technique. Martin (2000) defined that the challenge in geometric reasoning are first with the ability of the software to define geometric information and second is the ability of software to convert the geometric object representation from one to another to become suited with the particular task.
256
Understanding the shape of an object is essential in CAD for applying computational in geometric. CAD files contain detailed geometric information of a part. Geometry is usually represented in the CAD system in terms of low level geometric entities such as vertices, edges and surfaces or in terms of solid entities like cubes, cylinders and several others. Features in geometry can be described as groupings of topological entities from a component that are semantically significant in its production. The process of recognition object features in 3D may involve extraction on code of the design. This extraction process will reject and select some of the code which is suspected notoriously redundant. The extraction process will only extract relevant information from the code of the design. In the next section, works of feature extraction in 3D are reviewed. The algorithms of extracting are discussed. Follow by the research method of the research explained how the research is going through. Continue by the result of extraction from design code, the design of the features itself and the algorithm. The last section presents the conclusion reached and outlines the research prospect.
LITERATURE REVIEW
Analysis over previous related works of multiple researchers necessitates the use of either feature selection algorithm or extraction algorithm prior to classification. Algorithm will be an important part of future application programs for checking design rules and generating manufacturing advice and plans.
Geometric Reasoning Algorithm in Extraction and Recognition of 3D Objects
257
Nafis & Anwarul (2001) explain manufacturing feature recognition of a rotational component using DXF format, where the author integrates two independent systems which is CAD and CAPP. In their work, geometric information of rotational part is translated into manufacturing information through a DXF. In order to recognize different features of part from file, the authors introduced a feature recognition algorithm where the geometric information is stored refer to the respective DXF codes for each of the part. Presents the logics used to determine different types of lines and their orientation for each part. Aslan et.al (1997) in Data extraction from CAD model for rotational parts to be machined at turning centres, stated extraction the features process begins with an initial pass over the DXF file. This will help on deciding the process types on the part. The decision on the features type is based on the production rules defined for each feature. The system evaluates the coordinates extracted by comparisons of two, three or four of them to define features. The procedure takes the first record from DXF and compares it with its immediate successor. If there is inequality between coordinates, the procedure is carried out by checking against another inequality. The authours introduced two extraction algorithm in implementing their work that is extraction algorithm algorithm for coordinates that define part profile and algorithm for diameter and lenghts of the part. These two algorithm come out with validating contidions for the recess features and pre-defined binary decision tree. Nasr & Kamrani (2006) proposed a new methodology for extracting manufacturing features from CAD system presented the methodology consists three main phases: (1) a data file converter, (2) an object form feature classifier and (3) a manufacturing features classifier. The first phase converts a CAD data in IGES/B-rep format into a proposed object oriented data structure. The second phase classifies the geometric features of the designed part obtained from the data file converter into different feature groups and the third phase maps the extracted features to process point of view. In order to achieve the proposed methodology, the
258
authors introduced algorithm for extraction geometric entities from CAD files that consists of four separated algorithm. Start with extracting form features from CAD files, followed by algorithm for determining the concavity of the edge, algorithm for determination the concavity of the loop and completed by algorithm for feature extraction (production rules). Mok & Wong (2006) in Automatic feature recognition for plastic injection moulded part design proposed an automatic recognition of features in three hierarchical levels of primitive features, complexe features and high level complex features. The proposed method adopts a hybrid aproach with an algorithm. The procedure and the algorithm for feature consist of three level. Starts with level 1: primitive product root feature recognition, level 2: Primitive product feature recognition and level 3: Complex product feature recognition.
RESEARCH METHODOLOGY
The procedure of automated recognition of extracting 3D object features is to match the geometric entities with the database of the design code. Different CAD packages use different types of database structure to store the information of the part in CAD file. Therefore, features in the feature database should be pre-defined. Figure 1 explains briefly the process flow of the system. Start with the design of the part that has been converted to DXF code as an input. Designing part of the features and converting the CAD file was done using Mastercam. The code is recognizing line by line through feature extraction algorithm in order to capture the coordinate and the original drawing of the code. The extraction and recognition process is called when transferring the input data to the set of features where the input data to an algorithm is too large to be processed and it is
259
suspected notoriously redundant which means much data but not much information. If the features extracted are carefully chosen it is expected that the features set will extract the relavant information from the input data. In order to properly prove both of the algorithms, the algorithm has been run using Matlab software and tested with different features that has been identified.
DXF code
Tested
User1(input)
CADSystem
ObjectDrawing CAD3Dobject (DXF)
Partdesigndrawingusing Mastercam TranslatingtheCADfile toDXF
Geometrical Information FeaturesExtraction
Algorithmtoclassifiesthe code Algorithmtomapthe extractedfeature
Extraction on 3D Object Features
Figure 1
Information flow of the system
260
EXTRACTION FEATURES FROM CAD FILES
The research methodology is used for extracting and recognizing the object features. The development is done by using Matlab software. The designing object is combining of four different features. The combined features were known as hole, step, slot and pocket and the DXF code for each features are release. Figure 2 shows the design of the object features. After designing 3D object features, the CAD user had to convert the design file to DXF format to get the code. The DXF code will be used as an input for the development process. The geometric information will get through this code.
Figure 2
Combined features
261
The start point and end point are the basic entities information that is used to extract object features in DXF format. For each line of the designed part is recognizing by the value ID. Table 1 below displays the part of the analyzed code for the design of features. Each point is declared as start point or end point at the same time. Each point also connected each other at different position. The algorithm for extraction is able to extract many object features and it is develop for prismatic object. We used same algorithm for the different features. Define: Start; Read 0 = SECTION; Read 2 = ENTITIES; If 2 = ENTITIES, 0 = LINE Else, Read 2 = ENTITIES Read 0 = LINE; If 0 = LINE, 100 = AcDbLine Plot 10: X start; 20: Y start; 30: Z start; Plot 11: X end; 21: Y end; 31: Z end; Else, 0 = POINT, 100 = AcDbPoint Plot 10: X point; 20: Y point; 30: Z point; 0 POINT, Read 0 = LINE Read 0 = ENDSEC End 0 ENDSEC, Read 0 = LINE
262 End
The algorithm starts with assigning the DXF code. Then the code read line by line until the first SECTION and ENTITIES are finding. If the SECTION equals to 0 and ENTITIES equal to 2, checked the next row is LINE or POINT. If the LINE is firstly appear and equal to 0, the algorithm continue with AcDbLine is equal to 100 and Plot X, Y and Z coordinate where the declaration 10 belongs to X, 20 belongs to Y and 30 belongs to Z coordinate as start point. Followed by 11, 21 and 31 as end point for X, Y and Z coordinate. If LINE in not appear, the algorithm will find POINT that is equal to 0 and AcDbPoint is equal to 100. Then, plot 10 for X coordinate, 20 for Y coordinate and 30 for Z coordinate. If POINT is not equal to 0, the algorithm read after ENTITIES again. Finishing the algorithm with LINE and POINT find the ENDSEC is equals to 0. Otherwise, the algorithm stops all the previous direction is repeated until all the line in line of the DXF code are readable.
Table 1
Extraction on point and line
263
Figure 3 shows the result after the algorithm tested using Matlab. The design code is extract correctly according the process flow and appears as below. The algorithm also tested in single features
Figure 3
Tested features
264
CONCLUSIONS
We have described methods and algorithms to extract and recognize the object features from the geometric information. We also discussed the concept of the extraction process and the implementations. The extraction process is based on the design code and the geometric description. This method is able to extract many shapes of features in prismatic area. Although the proposed method successfully analyzes the features in 3D that content of many parts, much work still needs to be done in order to make it more functional. The limitation of the presented algorithm is not focuses on curve object in the code.
REFERENCE
A. ERSAN, S. ULVI AND NEDIM, (1997) Data Extraction from CAD for rotational Parts to be machined at Turning Centres, Tr. J. Engineering and Environment Science. B. LI AND J. LIU, (2001) Detail Feature Recognition and Decomposition in Solid Model, Computer Aided Design. C. K. MOK AND F. S. Y. WONG (2006) Automatic Feature Recognition for Plastic Injection Moulded Part Design, Int J Adv Technol. E. S. ABOUEL NASR AND A. K. KAMRANI,(2006) A New Methodology for Extracting Manufacturing Features fron CAD System, Computer & Industrial Engineering.
265
H. SAKURAI AND D. C. GOSARD, (1990) Recognizing Shape Features in Solid Models, IEEE Computer Graphics & Applications. L. Z. JIAN, LI, WANG, L. C. YUAN AND X. X. ZHI, (2004) Automatically extracting sheet-metal features from solid model, Journal of Zhejiang University SCIENCE. M. M. MAREFAT, (1997) Hierarchical Bayesian Methods for Recognition and Extraction of 3D shape Features from CAD Solid Models, IEEE Transactions on Systems, Man and Cybernetics. M. MAREFAT AND R.L KASHYAP (1990) Geometric Reasoning for Recognition of Three Dimensional Object Features, IEEE Transactions on Pattern Analysis and Machine Intelligence. M. P. GROOVER, E. W. ZIMMERS, (1984) CAD/CAM Computeraided Design and Manufacturing, Prentice-Hall, Englewood Cliffs. M. WEINSTEIN AND S. MANOOCHEHRI, (1994) Geometric Influent of a Molded Part on the Draw Direction Ranges and Parting Line Locations, Advances in Design Automation. MICHAEL BRADY (1988) Robotics Science, University of Oxford. N. AHMAD AND A. F. M. ANWARUL HAQUE, (2001) Manufacturing Feature Recognition of Parts using DXF Files, 4th International Conference on Mechanical Engineering. PAOLO DI STEFENO, (1997) Automatic extraction of form features for casting, Computer Aided Design. R. R MARTIN (2000) Geometric Reasoning for Computer Aided Design, Department of Computing Mathematics, university of Wales College of Cardiff. S. N. TRIKA AND R. L. KASHYAP (1994) Geometric Reasoning for Extraction of Manufacturing features in Ido-Oriented Polyhedrons, IEEE Transactions on Pattern Analysis and Machine Intelligence.
Index
266
Index
267
INDEX
accuracy, 2, 3, 16, 20, 56, 57, 58, 59, 63, 68, 73, 75, 76, 77, 78, 79, 83, 84, 139, 141, 142, 150, 155, 165, 174, 177, 195, 199, 206, 207 AcDbLine, 261, 262 AcDbPoint, 261, 262 activation function, 161 Algorithm, ii, 21, 26, 29, 33, 34, 36, 41, 42, 45, 46, 85, 86, 90, 93, 96, 99, 104, 107, 108, 119, 121, 123, 153, 177, 194, 207, 208, 209, 220, 226, 232, 244, 246, 248, 250, 253, 256 algorithm DR, 220 algorithm visualizes, 248, 250 alphabet, 53, 114, 115
Analysis, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 43, 44, 45, 46, 47, 48, 49, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 99, 100, 101, 102, 103, 104, 106, 107, 108, 109, 110, 154, 168, 210, 233, 256 Analysis of Form Images, 48, 110 Analysis of Segmentation Performance, 22, 86 Analytic Word Recognition Algorithm, 33, 46, 96, 108 applications, 19, 57, 68, 78, 113, 139 Approximate Beam Matching Algorithm, 29, 93 Architecture, 35, 98 Artificial Intelligence and Machine Learning (AIML), 254 Artificial Neural Networks, 21, 85 Automated Offline Handwriting Recognition System Incorporating, 31, 94
268
Automatic bank cheque processing, 36, 99 Automatic Character Recognition, 45, 107 Automatic segmentation, 48, 110
Index
CEDAR Benchmark Database, 22, 86 CEDAR database, 59, 61, 63, 66 cell-representation, 236, 243, 250, 253 cells, 68, 151, 217, 226, 235, 236, 237, 248, 249, 250, 252 center character confidence (CCC), 61 center character validation (CCV), 65 center contour (CC), 64 chain code, 211, 212, 214, 215, 216, 219, 220, 231, 232, 235, 236, 238, 239, 240, 243, 248, 250, 252, 253, 254 chain code elements, 243 chain code processing algorithm, 254 Chains, 214 character bodies, 11 character classification, 78, 79 Character extraction Verma, 63 character model, 57, 70 character model word recognition approach, 57 character neural networks, 30, 93, 94 character recognition, 2, 20, 25, 30, 33, 38, 40, 48, 71, 75, 76, 84, 85, 89, 93, 96, 97, 101, 103, 110, 140 Character Recognition Methodologies, 39, 101 Character Recognition Systems, 27, 91 character segmentation, 2, 9, 38, 51, 59, 60, 67, 75, 101 character segmentation process, 51 characterization, 165, 168, 174 characters, 2, 3, 7, 8, 9, 10, 14, 15, 16, 17, 18, 44, 51, 54, 55, 56, 57,
background pixels, 52, 245 bagging algorithm, 40, 103 Based Algorithm for Slant Removal, 42, 104 base-fit line, 64 baseline, 8, 10, 11, 40, 54, 56, 57, 62, 65, 103 baseline, lower, 8 Baum-Welch Algorithm, 123 Baum-Welch learning process, 124 Bayes-Based Region-Growing Algorithm, 153 binary image, 7, 179, 233, 236, 243, 245, 246, 250, 252, 253, 254 blurring, 183 Brain, 153, 154, 163, 164, 174 Brain MR Images, 164, 174 Brain MR Images Segmentation Method Based, 164 bridges, 69 building, 23, 42, 88, 105
CAD, 255, 256, 257, 258, 260, 264, 265 CAD files, 256, 257 CAD system, 255, 256, 257 Caesar, 9, 12, 24, 25, 88, 89 calculation, 119, 120, 121, 133, 159, 203, 218, 220, 229 Candela, 179, 180, 193, 209 Casey, 2, 25, 51, 55, 70, 72, 76, 89 CEDAR, 22, 44, 57, 59, 61, 63, 65, 66, 86, 107
Index
59, 64, 66, 67, 69, 71, 72, 73, 75, 77, 84, 106, 113 Characters in Scripted Documents, 38, 101 classes, 53, 143, 145 classification, 4, 7, 10, 19, 20, 46, 48, 68, 75, 76, 77, 78, 79, 108, 109, 138, 143, 144, 156, 163, 173, 175, 195, 196, 256 classification process, 76, 78, 156 classifiers, 23, 26, 30, 32, 37, 45, 48, 77, 78, 79, 80, 84, 87, 90, 94, 96, 100, 107, 109, 145, 173 cluster, 146 Code, 216, 217, 235, 236, 237, 241, 243, 248, 250, 251, 252, 253 column-1, 247 columns, 184, 246 Combination, 36, 99 Combinatorics and Image Processing, 22, 87 complexity, 19, 20, 77, 116, 119, 120, 139, 229 components, 9, 18, 53, 56, 58, 59, 83, 179, 186, 187, 205 Computation, 42, 104, 207 Computer Aided Design, 264, 265 Computer processing, 254 computer systems, 142, 255 Computer Vision, 36, 43, 99, 105 Computers Graphics, 33, 97 concave vertex, 227, 228 concavity, 57, 258 Confidence values, 61 connectivity, 143, 157, 212, 215, 238, 245, 246, 247, 248 constituents, 70 Context-Directed Handwritten Word Recognition, 36, 99 Contextual Recognition of Cursive Script, 28, 92 continuous shape, 211, 231
269
Contour, 6, 21, 47, 48, 85, 109, 110, 214, 254 Contour Character Extraction Approach, 47, 109 Contour Code Feature Based Segmentation, 47, 109 contrast, 139, 143, 165, 166, 171, 174, 183, 230 convex, 12, 56, 218, 226, 227, 228 Co-occurrence Matrix, 165 core, 8, 9, 10, 11, 18, 195, 196, 197, 198, 202, 203, 204, 205, 206 core point, 195, 196, 197, 198, 202, 203, 204, 205, 206 Core Point Detection, 202 core point location, 197, 198, 202, 203, 205, 206 core region, 9, 10, 11, 205 core region lines, 10 correct slant angle, 15 correction, 6, 10, 11, 12, 13, 14, 15, 16, 17 correlation, 139 counter, 226, 248, 250 CSF (Cerebrospinal Fluid), 173 CSP, 61 Cursive Character Recognition, 22, 86, 87 cursive characters, 75, 84 cursive handwriting, 1, 9, 20, 24, 31, 32, 33, 51, 55, 65, 70, 75, 83, 85, 88, 94, 96, 97, 113, 140 cursive handwriting recognition, 1, 24, 31, 32, 33, 51, 52, 55, 75, 83, 88, 94, 96, 97, 113, 140 cursive handwriting recognition process, 75 cursive handwriting segmentation, 9, 51, 70 Cursive Script, 23, 27, 30, 32, 34, 37, 44, 87, 91, 93, 95, 97, 100, 106
270
Cursive script recognition, 31, 95 Cursive Script Word Recognition, 23, 87 cursive text, 56 Cybernetics, 30, 41, 44, 48, 93, 94, 104, 106, 110, 265
Index
distance, 58, 65, 68, 145, 146, 170, 212, 218, 231 distribution, 10, 12, 13, 14, 16, 61, 114, 115 document analysis, 2, 20, 25, 32, 39, 41, 89, 95, 102, 104 document images, 6, 18 DSS recognition algorithms, 220 DXF, 256, 257, 258, 260, 261, 262, 265 DXF code, 257, 258, 260, 262 DXF format, 256, 260, 261 dynamic programming (DP), 129 dynamic programming algorithms, 70
data file converter, 257 Decoding, 36, 99, 118, 121 Degraded Text Recognition, 22, 87 delta points, 196 descenders, 6, 9, 14, 15, 54, 77 descenders information, 15, 54 Design, 21, 23, 85, 87, 208, 264, 265 Detection of Curvature, 202, 204 dictionary, 71 differences, 19, 61, 156, 159 digital arc, 219, 220 Digital Image Processing, 194 digital line, 211, 212 Digitization, 4 digitization procedure, 18 digitizing, 213 digits, 2, 45, 67, 107, 240 dilation, 6, 146 Direction Connectivity, 247 directional codes, 185 Directional Filtering, 186, 194 Directional Image, 177, 193 Directional Image Partitioning, 193 Directional masks, 181 directional windows, 185, 186 directions connectivity, 247, 248 discontinuities, 144, 196, 199 discrete geometric estimators, 211 discrete geometry, 211, 232 discretization, 212, 213, 231 discretization schemes, 212 dissection, 53, 54, 56, 64, 72
edges, 16, 218, 228, 256 Effective Cursive Word Segmentation Method, 40, 103 Efficient off-line cursive handwriting word recognition, 31, 94 eigenvector, 14 element, 69, 172, 177, 181, 184, 212, 215, 235, 237, 239, 243, 246, 248 element position, 235, 237, 239, 248 energy, 158 enhanced image, 148, 150, 183, 199 enhanced MRI slice, 151 Enhanced Outlier Rejection Ability, 37, 100 enhancement, 18, 45, 107, 141, 148, 175, 176, 183, 199, 206 Enhancing cursive word recognition performance, 43, 105 entities, 69, 256, 257, 258, 261
Index
error prone process, 51, 176 error term, 159 errors, 17, 45, 62, 73, 84, 107, 158, 176 estimate, 11, 13, 17, 54, 59, 133, 136, 137, 178, 180, 202, 212, 218, 229 estimation, 9, 10, 11, 12, 14, 15, 18, 28, 68, 124, 125, 130, 131, 133, 134, 168, 202, 206, 211, 218, 230, 231, 232 Estimators, 229, 232 Evaluation, 28, 118, 119, 194 Evaluation Problem, 118, 119 experiments, 5, 61, 63, 65, 67, 80, 207 Extension to Phrase Recognition, 35, 98 extract, 57, 69, 176, 196, 211, 212, 256, 259, 261, 263, 264 extracting, 56, 142, 144, 236, 256, 257, 258, 260, 265 Extracting WM, 149 extraction, 4, 9, 22, 25, 34, 40, 48, 59, 64, 65, 66, 75, 77, 84, 86, 89, 97, 103, 110, 141, 176, 195, 255, 256, 257, 258, 261, 264, 265 extraction algorithm, 195, 256, 257, 258 extraction process, 256, 264
271
Fast script word recognition, 43, 106 Faster Optical Character Recognition, 26, 89 feature extraction process, 20, 184 feature recognition algorithm, 257 Feature Selection, 40, 41, 103, 104 Feature selection algorithms, 32, 96
Feature Selection for Ensembles, 41, 103, 104 feature size, 211, 231 feature vectors, 57, 138, 158, 170, 171, 172 Features III- Frontiers, 32, 95 files, 258 filter function, 183, 187 filter shifts, 184 Fingerprint Classification, 194, 207, 208, 209, 210 Fingerprint Enhancement, 194, 209 Fingerprint identification, 175 fingerprint identification system, 175, 176, 196 Fingerprint image reconstruction process, 191 fingerprint images, 175, 176, 177, 183, 195, 196, 197, 198, 199, 200, 202, 205, 206 fingerprint matching, 176, 195, 196 fingerprint recognition, 177 Fingerprints, 193, 194, 207, 209, 210 fingerprints images, 186 finite state network (FSN), 70 fluctuation, 185 foreground pixels, 10, 52 formula, 13, 126, 130, 131, 133, 242 Forward Algorithm, 119 Freeman chain code, 215, 216, 231 Freeman Chain Code (FCC), 235 Frequency Domain, 182, 183 frontiers, 2, 37, 100 furrows width, 190 fusion, 20, 61, 79, 85
generalized projections (GP), 14
272
geometric information, 255, 256, 257, 260, 264 geometric reasoning, 255 Geometry, 202, 205, 233, 256 Gradient, 25, 126 Grapheme-Based Segmentation Technique, 37, 100 graphemes, 55, 70, 71 Graphics, 29, 43, 46, 93, 108, 233, 265 gray levels, 143, 172 Grey Matter (GM), 141 grid points, 219, 220, 221, 222, 223 groups, 16, 55, 211, 257 GUMLP, 60, 61
Index
34, 35, 36, 37, 38, 39, 41, 42, 43, 44, 47, 48, 49, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 108, 110, 140 handwritten character recognition, 33, 97 Handwritten Digit Recognition, 30, 94 handwritten numeral recognition, 23, 24, 33, 88, 96 Handwritten Numeral Strings, 23, 87 handwritten numerals, 25, 26, 39, 49, 89, 90, 102, 111 handwritten text, 15, 67, 80 Handwritten Text Recognition Systems, 35, 98 Handwritten Word Recognition, 21, 22, 25, 26, 29, 35, 36, 37, 38, 48, 86, 89, 90, 93, 98, 99, 100, 101, 110, 140 handwritten words, 4, 18, 28, 37, 41, 56, 57, 58, 62, 66, 67, 80, 100, 104 heuristic algorithm, 59, 61, 62 heuristics, 12, 16, 18, 52, 55, 70 hexagonal, 216, 217, 235, 236, 250, 253 hexagonal cell, 236 Hidden Markov, i, 22, 25, 26, 31, 36, 40, 42, 49, 67, 69, 70, 87, 89, 90, 95, 99, 103, 105, 111, 113, 114, 139, 140 Hidden Markov Model Based Word Recognition, 36, 99 hidden Morkov model, 25, 26, 89, 90 Hierarchical Multi-Objective Genetic Algorithm Approach, 41, 104 Histogram Modelling, 182, 183
Handwriting, 1, 3, 4, 21, 22, 23, 24, 26, 27, 28, 29, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 45, 46, 47, 49, 77, 79, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 140 handwriting images, 47, 109 handwriting recognition, 2, 3, 4, 15, 25, 31, 37, 42, 45, 49, 54, 64, 79, 80, 83, 89, 100, 104, 107, 111, 113, 138 handwriting recognition strategies, 3 Handwriting Recognition Systems, 45, 47, 107, 109 Handwriting recognition techniques, 77 handwriting segmentation, 6 Handwriting Segmentation Context, 31, 94 Handwritten, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
Index
histograms, 12, 16, 183 homogeneity, 143, 156, 171 horizontal direction, 11 horizontal projection profile (HPP), 7 human brain, 155, 156, 163 hybrid approach, 16, 52, 60 Hybrid Contextual Text Recognition, 44, 106 Hybrid Off-Line Cursive Handwriting Word Recognition, 31, 94 Hypergraph imaging, 23, 24, 87, 88 Hypergraph model, 66
273
Image, 22, 24, 26, 27, 28, 33, 34, 35, 36, 40, 42, 43, 44, 46, 48, 87, 88, 90, 91, 92, 97, 98, 99, 103, 104, 105, 106, 108, 110, 140, 145, 146, 148, 149, 152, 153, 154, 163, 168, 174, 188, 193, 194, 207, 208, 209, 232, 233, 235, 246, 251 Image Analysis, 24, 33, 40, 44, 88, 97, 103, 106, 208 image characteristics, 200 image enhancement, 141, 176 Image Matrix Based Feature Extraction Algorithms, 48, 110 Image Processing, 22, 26, 27, 28, 33, 35, 42, 43, 46, 87, 90, 91, 92, 97, 98, 105, 108, 140, 154, 194, 208 image processing techniques, 165 Image representation, 235 image segmentation, 143, 152, 156, 157, 163 Image thresholding techniques, 152 Image Vision Computing, 35, 98
imaging, 23, 24, 87, 88, 142, 153, 155, 156, 174, 183 Imaging and Image Processing, 22, 86 Implicit Segmentation, 25, 89 implicit segmentation-based methods, 67 Improvement, 23, 28 Independent Component Analysis Segmentation Algorithm, 26, 90 Information Processing, 28, 47, 92, 109, 210 initialize, 125, 226, 230 inner boundary, 240, 241, 253 Input, 3 input data, 79, 258 input fingerprint images, 176 input word, 70, 71 integers, 220, 223, 227 Integrated segmentation, 39, 102 intensities, 143, 172 Intensity Correction, 149 Interface, 252 invariant, 27, 91, 177, 215, 236, 240, 249 Inverse difference, 171 Inverse Fast Fourier Transform (IFFT), 183 isolated characters, 24, 70, 75, 80, 88 iterative cross section sequence graph (ICSSG), 67
junction points, 7
Kmeans algorithm, 145, 149 K-Means clustering method, 150 knowledge, 3, 51, 59, 72, 139, 145, 211
274
Knowledge-based English Cursive Script Segmentation, 48, 110
Index
Matlab software, 259, 260 matrices, 134 maximum likelihood, 121, 139 Maximum Mutual Information (MMI), 122 medical images, 141, 143, 163, 174 Methodology, 29, 35, 93, 98, 147, 158, 243, 264 methodology process, 167 minima, 11, 12, 56, 65 minutiae, 175, 176, 178, 184, 194, 195, 196 minutiae extraction, 176 misclassification, 145, 146 MLP algorithm, 228, 229 MLP vertex, 228, 229 Model, 22, 25, 29, 32, 39, 46, 87, 89, 93, 95, 102, 108, 114, 139, 163, 209, 264 model parameters, 118, 123, 126 Modeling and Recognition of Cursive Words, 26, 90 modifications, 207 Morphological Cursive Word Segmentation, 46, 108 Morphological Methods in Image and Signal Processing, 25, 89 MR image segmentation methods, 156 MR image segmentation techniques, 152 MR images, 141, 155, 156, 158, 163, 173 MR medical image characteristic, 173 MRI, 141, 143, 144, 151, 153, 154, 155, 164, 165, 166, 168, 169, 170, 171, 172, 174 MRI brain dataset, 166, 168 MRI brain images segmentation, 164
Large Vocabulary Handwriting Recognition, 36, 99 Large Vocabulary Off-line Handwriting Recognition, 36, 99 L-Block, 250, 251, 253 Length, 42, 105, 218, 231, 233, 240 Length Estimators, ii, 218, 233 letters, 1, 9, 54, 55, 56, 59, 70 level, 19, 23, 37, 41, 42, 72, 80, 84, 88, 100, 104, 105, 142, 170, 171, 172, 179, 185, 256, 258 Lexicon, 35, 36, 44, 98, 99, 106 lexicon size, 57, 138 Lexicon-Directed SegmentationRecognition Procedure, 36, 99 Ligature detection, 65 Ligatures, 45, 56, 107 Line removal, 6 literature, 1, 6, 11, 15, 19, 51, 52, 70, 71, 75, 78, 79, 83, 122, 142, 212 local Ridge Orientation (LRO), 183 locate, 54, 56, 57, 59, 197, 198, 199, 206 logProb, 137, 138 Low-Pass Filtering, 178, 179
Machine Recognition of Cursive Writing, 28, 92 Machine Recognition of Handwritten, 43, 105 magnetic resonance imaging, 153 Manual segmentation, 141, 142 MAPPING ALGORITHM, 243 mapping algorithm of rectangular vertex chain code, 235, 243, 252
Index
multi-objective genetic algorithms, 41, 103 Multiple Handwritten Word Recognition Techniques, 47, 109 Multi-resolution approach, 198 MUMLP, 60, 61 MURBF, 60, 61
275
neighboring pixels, 7, 58, 157, 169, 170, 215 Neural Based Segmentation of Cursive Words, 26, 90 neural confidences, 61 Neural Information Processing Letters, 164 Neural Information Processing Systems, 34, 97 Neural Network Based Segmentation Handwritten Words, 28, 92 Neural network recognition, 44, 106 Neural Networks, 21, 22, 24, 26, 27, 30, 37, 38, 43, 46, 47, 86, 88, 90, 91, 93, 100, 101, 106, 108, 109, 140, 163 Neural Networks for Signal Processing, 47, 109 nodes, 68, 122, 217, 236 noise, 5, 6, 18, 19, 143, 144, 148, 150, 156, 178, 182, 183, 187, 190, 197, 198, 199, 200, 215 noise blocks, 19 noise components, 18 noise effects, 190 noise of orientation field image, 197, 198 Noisy and Enhanced Image, 152 noisy image, 150, 151
Nonparametric Approach, 230 non-segmentation, 29, 92 norm, 172 Normalization Techniques, 48, 110 Nouvel Algorithme, 32, 95 number, 1, 10, 12, 16, 54, 56, 57, 58, 64, 68, 69, 70, 71, 72, 77, 114, 119, 125, 134, 138, 156, 172, 176, 186, 189, 206, 216, 217, 218, 219, 230, 235, 236, 237, 239, 240, 242, 245, 248 numer, 136, 137 Numeral, 33, 42, 96, 104
object, 20, 85, 143, 145, 203, 212, 214, 245, 255, 256, 257, 258, 260, 261, 264 Object Features, ii, 265 object recognition, 20, 85, 214 octant, 222, 223, 226 off-line, 2, 3, 4, 35, 44, 46, 48, 55, 59, 70, 75, 78, 80, 84, 98, 106, 108, 109 off-line cursive handwriting recognition, 2, 3, 44, 84, 106 Offline Cursive Handwriting Recognition System, 46, 108 Off-line Cursive Handwriting Recognition System, 43 Off-line Cursive Handwriting Recognition System, 106 Off-line Cursive Script Recognition, 34, 97 Off-line Cursive Script Word Recognition, 22, 45, 87, 107 Off-Line Cursive Word Recognition, 44, 106 Offline Handwriting Recognition, 46, 108
276
Off-line recognition of handwritten postal words, 24, 88 On-line, 3, 45, 107, 140 On-line Handwriting Recognition, 45, 107 Optical character recognition, 20, 85 optimization criteria, 122 order HMMs, 116 orientation, 11, 16, 19, 176, 177, 178, 179, 181, 185, 195, 196, 197, 198, 199, 202, 204, 205, 206, 226, 257 orientation field, 177, 196, 197, 198, 199, 202, 204, 205 orientation field image, 197, 199 orientation image, 178, 197, 198 Output, 117 output node, 159, 161 over-segment, 57, 59, 62, 64, 70, 72, 73, 84 over-segmentation, 64, 70
Index
Pattern Recognition Letters, 20, 30, 32, 33, 35, 41, 42, 45, 47, 48, 85, 94, 96, 97, 98, 104, 105, 107, 109, 110, 232, 233 pattern recognition methods, 165 pattern recognition problem, 77 pixel distance, 170 pixels, 7, 10, 52, 58, 66, 143, 145, 157, 169, 170, 172, 180, 181, 182, 184, 185, 186, 204, 213, 214, 215, 218, 241, 245 planes, 212, 220 Plot, 261, 262 Poincare index (PI), 197 Poincare index method, 197, 207 Poincare method, 198 pointer, 122 Polygonal approaches, 219 post processing operations, 145 preprocessing, 5, 6, 8, 10, 14, 15, 20, 83, 146 Pre-processing process, 243 preprocessing stage, 6 pre-processing step, 183, 245 preprocessing tasks, 5 preprocessing techniques, 6, 8, 15, 20 pre-segmentation, 54, 55, 64, 68, 72 Principal Components Analysis, 44 probability, 71, 78, 114, 115, 118, 119, 120, 121, 122, 123, 124, 128, 129, 170 probability distribution, 114, 115 problems, 2, 6, 8, 18, 38, 56, 72, 75, 83, 84, 101, 117, 118, 121, 143, 156, 176 Proc, 21, 28, 29, 30, 31, 32, 38, 42, 43, 85, 91, 92, 93, 95, 96, 100, 105, 153, 194, 208 process, 2, 3, 6, 7, 9, 10, 11, 14, 52, 53, 54, 58, 59, 63, 68, 71, 72, 76,
path, 58, 64, 65, 68, 69, 70, 129, 130, 215 Pattern Analysis, 20, 22, 24, 25, 26, 29, 30, 33, 35, 36, 37, 38, 39, 41, 42, 43, 44, 45, 49, 85, 87, 88, 89, 90, 92, 93, 94, 96, 97, 98, 99, 100, 101, 102, 103, 105, 106, 107, 153, 194, 207, 209, 232, 265 Pattern Recognition, 20, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 45, 46, 47, 48, 49, 85, 87, 88, 89, 90, 91, 92, 94, 96, 97, 98, 100, 101, 102, 103, 104, 105, 107, 108, 109, 110, 194, 207, 208, 209, 210, 232, 233, 254
Index
122, 125, 141, 142, 145, 150, 156, 157, 158, 184, 190, 198, 199, 204, 206, 212, 223, 228, 231, 243, 252, 256, 257, 258, 260, 263, 264 process computational, 14, 63 projection profile, 7, 8, 11, 14, 16, 77 projections, 12, 13, 14, 16 propagation, 60, 61, 158
277
quality, 175, 176, 177, 190, 195, 197, 198, 206
Recent Advances, 22, 87 Recognition, 4, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 67, 70, 79, 81, 83, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 140, 194, 208, 210, 232, 233, 264, 265 Recognition enhancement, 45, 107 recognition performances, 80 recognition processes, 67 recognition rate, 56, 57, 61, 69, 71, 78 recognition results, 1, 71 recognition system, 6, 20, 41, 48, 56, 58, 80, 104, 110 recognition tasks, 118 recognition techniques, 60, 61, 78 Recognition-directed, 47, 109 recognizer, 8, 58, 70 rectangle, 238, 239, 246, 247, 248 rectangle cells, 248
rectangular, 169, 216, 217, 235, 236, 238, 239, 240, 243, 246, 247, 248, 250, 252, 253 rectangular cells, 236, 238, 240, 246, 248, 252, 253 rectangular VCC, 247, 249, 250, 252 rectangular-VCC cells, 243, 252, 253 recursions, 124 Re-estimate, 125, 136 reference lines, 8, 9, 10, 12, 14, 63 Reference lines detection, 8 reference points, 196, 198 region, 6, 8, 9, 10, 11, 18, 43, 68, 105, 144, 145, 155, 157, 158, 177, 185, 196, 198, 205, 206, 213, 214, 215, 238 regularities, 69 Relaxation procedures, 69 reliability, 79, 142, 143, 155, 158, 195, 197, 199 remainders, 8 removal, 5, 6, 7, 8, 9, 11, 14, 19, 182, 190 Research, 34, 39, 47, 97, 102, 109 researchers, 1, 6, 18, 54, 55, 72, 73, 77, 78, 79, 138, 156, 197, 199, 200, 202, 256 restoration, 7, 8 ridge lines, 177, 186 ridge orientations, 176 ridges, 175, 176, 177, 185, 186, 190, 202, 206 rotation, 199, 215, 216, 236, 240, 253 row-1, 247 Rule Based Neural Network Validated Segmentation, 31, 94 rules, 6, 10, 52, 55, 56, 58, 59, 62, 64, 68, 71, 78, 245, 247, 248, 256, 257, 258
278
Index
segmentation-free hidden Morkov modeling, 39, 102 segmentation-recognition, 52 segmented handwritten characters, 22, 86 segregated character, 59 selection, 41, 42, 75, 76, 77, 84, 103, 105, 144, 256 Sentence Level Recognition Scheme, 32, 95 Sequence, 27, 91 set, 10, 12, 30, 32, 52, 55, 56, 57, 60, 62, 64, 68, 69, 70, 72, 75, 76, 77, 79, 93, 94, 96, 114, 122, 125, 140, 156, 173, 184, 186, 214, 219, 221, 227, 258 shear, 13, 16, 17 shear angles, 16, 17 Signal Processing, 25, 47, 89, 109, 140, 152, 163, 194, 209 singular points, 178, 195, 196, 197, 199, 201, 207 singular points location, 199 singularities, 69, 197, 198 skew angle, 11, 13, 14 Skew correction, 10 skew detection, 5, 12, 14 Skew estimation, 10 skull, 146 slant, 5, 9, 12, 15, 16, 17, 18, 23, 28, 45, 46, 107 slant correction, 16, 17, 28, 45, 46, 107 Slant estimation, 6, 15, 35 slit sums, 180, 181, 182 slope, 10, 11, 12, 59, 220, 222, 224, 225 slope angle, 12 Smoothing Filter, 182, 183 Solid Models, 265 Spatial Domain, 182, 190 spatial information, 157, 172
scale, 3, 130, 133, 135, 136, 148, 179, 215 Scaling, 6, 18, 130 Scanning procedure, 57 Scheme, 32, 95 SCIA, 232, 233 script, 6, 8, 35, 47, 51, 54, 55, 56, 57, 59, 60, 64, 67, 69, 71, 98, 109 Script recognition, 20, 85 sector, 228, 229 segment point validation (SPV), 64, 65 segmentation, 2, 4, 6, 18, 20, 26, 29, 30, 39, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 83, 84, 90, 92, 93, 102, 113, 118, 140, 141, 142, 143, 144, 150, 153, 155, 156, 158, 163, 164, 165, 173, 212 segmentation accuracy, 63, 73, 84 segmentation algorithm, 52, 56, 57, 59, 68, 69, 73, 84 Segmentation Determined, 48, 110 Segmentation error rates, 66 segmentation errors, 2, 61, 72, 75 Segmentation graph, 70 Segmentation methods, 30, 93 segmentation path direction (SPD), 64 segmentation points, 54, 56, 57, 58, 59, 61, 63, 65, 73, 84 Segmentation problem, 63 segmentation procedure, 70 segmentation process modified direction features, 64 segmentation processes, 142 Segmentation Results, 64 segmentation technique, 58, 59, 60
Index
Speech Recognition, 42, 105, 140 split, 17, 55, 57, 58, 69, 233 SPV technique of Blumenstein and Verma, 61 start, 215, 223, 228, 261, 262 start point, 261, 262 start vertex, 228 starting point, 64, 65, 215, 216, 236, 240, 249 states, 114, 116, 122, 129 statistical approaches, 199 steps, 4, 10, 59, 65, 165, 176, 177, 186, 202, 206, 218 Stereological Approach, 230 Stock, 179, 194 straight line segments, 214 straight lines, 7, 14, 212, 213, 218, 221, 222 strokes, 4, 7, 9, 10, 12, 15, 16, 17, 54, 57, 58, 59, 69, 77 structuring elements, 60, 69 subject, 172 surface areas, 230, 232 SVM, 46, 108, 153, 173 Symmetry, 208
279
tissues, 141, 142, 145, 146, 150, 151, 155, 156 touch, 216, 235, 237, 239, 240, 248 Touching Characters, 25, 89 training samples, 78, 173 transcribes, 236, 248 transcribing, 236, 243, 248, 250, 253 transcribing algorithm, 236, 244, 248, 250, 253 transformation based learning (TBL), 68 triangular, 216, 217, 235, 236 tumor, 141, 142, 143, 151, 156, 165, 166, 168, 173, 174 Unconstrained handwritten character recognition, 33, 96 Underflow, 129 underline removal, 6, 7, 8, 19 user interaction, 142, 144
tangent vector integration process, 229 text recognition, 29, 92 Texture Analysis, 165, 168 texture characteristics, 174 theorem, 219, 221, 231 thickness, 7, 8, 58 thinned binary image, 236, 243, 245, 246, 250, 252, 253, 254 Thinning Algorithms, 45, 107 Thinning process, 245 threshold, 6, 10, 19, 125, 143, 199, 214, 245
Validation, 27, 65, 90, 248, 250 validation algorithm, 236, 243, 244, 248, 250, 252, 253 validation process, 243 values, 12, 61, 64, 71, 76, 125, 132, 133, 134, 145, 157, 158, 170, 180, 183, 203, 223, 236, 245 Variable Duration, 26, 90 variables, 120, 123, 124, 246 Variance, 46, 108, 172 variant-slanted words, 17 VCC cells, 235, 238, 243, 252, 253 vector, 25, 27, 33, 48, 78, 89, 91, 97, 109, 115, 172, 178, 179, 225, 229 verification process, 195 vertex, 212, 226, 227, 228, 229, 230, 235, 236, 238, 239, 240, 243, 248, 250, 252, 253, 254
280
vertices, 212, 216, 217, 218, 227, 228, 229, 235, 236, 237, 239, 240, 248, 256 Visual Image Signal Processing, 194 Visualization, 22, 43, 86, 163 Viterbi algorithm, 113, 121, 139 voxel, 156, 172, 173
Index
Weights, 218 White Matter Lesions, 154, 174 Wigner-Ville distribution, 12, 16 word image, 8, 9, 12, 14, 15, 17, 19, 51, 55, 58, 62, 68, 69, 72, 138 word image preprocessing, 15 word recognition, 2, 5, 24, 25, 26, 27, 28, 29, 30, 32, 34, 37, 39, 45, 46, 48, 55, 57, 59, 71, 73, 75, 77, 78, 80, 84, 89, 90, 91, 92, 93, 94, 95, 96, 97, 100, 102, 107, 108, 109, 139, 140 word recognition process, 73, 84 word recognition rates, 77 word recognition system, 5, 32, 57, 59, 80, 95 word segmentation, 2, 53, 56, 59, 60, 70, 72

Chapter

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter

Uploaded by

Copyright:

Available Formats

First Edition 2007

DZULKIFLI MOHAMAD & NUR ZURAIFAH SYAZRAH OTHMAN 2007

Diterbitkan di Malaysia oleh / Published in Malaysia by

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Segmentation based handwritten word recognition system

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Four reference lines and core region

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Slope and Slant

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Where x max is the width of the image.

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Horizontal projection of left and right part of the word

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Figure 6 Skew angle estimation

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Pre-processing Techniques for Offline Cursive Handwriting Recognition: Recent Advances

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Hierarchy of word segmentation/recognition methods. (Reprinted from Cheriet et al., 2007)

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Advances in Image Processing and Pattern Recognition: Algorithms and Practice

Example of word segmentation (reprinted from Veloso et al., 2000)