You are on page 1of 11

Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition

U.-V. Marti and H. Bunke Institut f r Informatik und angewandte Mathematik u Universit t Bern, Neubr ckstrasse 10, CH-3012 Bern, Switzerland a u email: marti,bunke @iam.unibe.ch February 5, 2001

Abstract In this paper we present a system for recognizing unconstrained English handwritten text based on a large vocabulary. We describe the three main components of the system, which are preprocessing, feature extraction and recognition. In the preprocessing phase the handwritten texts are rst segmented into lines. Then each line of text is normalized with respect to of skew, slant, vertical position and width. After these steps, text lines are segmented into single words. For this purpose distances between connected components are measured. Using a threshold, the distances are divided into distances within a word and distances between different words. A line of text is segmented at positions where the distances are larger than the chosen threshold. From each image representing a single word, a sequence of features is extracted. These features are input to a recognition procedure which is based on hidden Markov models. To investigate the stability of the segmentation algorithm the threshold that separates intra- and inter-word distances from each other is varied. If the threshold is small many errors are caused by over-segmentation, while for large thresholds errors from under-segmentation occur. The best segmentation performance

is 95.56% correctly segmented words, tested on 541 text lines containing 3899 words. Given a correct segmentation rate of 95.56%, a recognition rate of 73.45% on the word level is achieved. Keywords: handwriting recognition, text line to word segmentation, word recognition, hidden Markov models.

1 Introduction
During the last years handwriting recognition has become an intensive research topic. While the rst systems read segmented characters [1], later efforts aimed at the recognition of cursively handwritten words [2]. Only a short time ago the rst systems appeared which are able to read sequences of words [3, 4, 5]. But these systems operate in small, very specic domains. Only very few systems are known which address the domain of general handwritten text recognition. Among these systems, two fundamentally different approaches have can be observed. In [6, 7] lines of text are segmented into individual words during a preprocessing phase, while the method described in [8] is totally segmentation free, i.e. the segmentation is integrated into, and obtained as a byproduct of, the recognition process. In the present paper we propose a system that follows the rst approach, i.e. it segments complete lines of handwritten text into single words. Segmenting a handwritten text into lines, and particularly to further segment it into single words is difcult and error prone, because it is a pure bottom-up process and no high level information about the text is available. The word segmentation algorithm used in our system incorporates some ideas from [9, 10]. For word recognition hidden Markov models over a given vocabulary are used. In our earlier work we have developed a segmentation free recognizer [8]. But as general handwritten text recognition is a difcult task, it can be expected that the performance of a system can be boosted through multiple classier combination. Because the segmentation based recognition algorithm described in this paper is very different from the one introduced in [8], it can be assumed that the integration of both leads to an improvement of the recognition rate. In a series of experiments the segmentation performance of the proposed method was tested based on the number of under- and over-segmentations. Also the recognition 2

rate on the word level was measured. The data used are part of the IAM-database described in [11]. Totally 59 pages of handwritten text were used, containing 3899 word instances in 541 lines. In the next section we show how the data are preprocessed and how text lines are segmented into single words. The features and their extraction for recognition is described in Section 3. In Section 4 the recognition procedure is introduced. Experimental results obtained with this system are presented in Section 5. At the end, in Section 6, we draw some conclusions from our work.

2 Preprocessing - Text Line Segmentation


The data input to the system are images of complete pages of handwritten text from the IAM-database [11]. An example is shown in Fig. 1. First the text is split into individual lines, which are normalized with respect to skew, slant, vertical position and width. For details of these procedures see [8]. An example is shown in Figs. 2 and 3. To segment a text line into single words, connected components are extracted. The case where two different words - or parts of different words - touch each other, resulting in a single component, is rather seldom. But it often happens that a word is split into several connected components. For an example see Fig 3. Hence the goal is to cluster the connected components of a text line such that each cluster corresponds to exacly one word. Once the connected components have been determined, their convex hull and center of gravity in a line of text is computed. Then for each pair of connected components,

and , the straight line segment that connects the center of gravity of with the
center of gravity of

is considered. The distance between the two points where intersects the convex hull of and is determined. This distance is assigned to
as a weight. A graphical illustration is shown in Fig. 4. By means of this procedure, a completely connected and weighted graph is obtained, where each node corresponds to a connected component in the image and the weight on an edge represents the distance between two connected components. Given such a graph, its minimum spanning tree is computed [12]. For an example see Fig. 5.

Figure 1: Image of a text page.

Figure 2: Image of an uncorrected line of text (Fig. 1, line 3 from bottom). To segment the text line under consideration into words, a threshold has to be determined to distinguish pairs of connected components that belong to one word from these belonging to two separate words. For this purpose, the line width , the median stroke width and the median distance between two vertical strokes line width is dened as follows:

are considered. The

black

black

(1)

It represents the horizontal distance between the leftmost and rightmost black pixel in the considered line of text. The median stroke width following formula: 4

is dened according to the

Figure 3: Image of the preprocessed and normalized text line in Fig. 2.

c1 s d

c2

Figure 4: Distance between two connected components.

median

(2)

using the horizontal projection histogram

It represents the median of the num-

ber of black pixels, taken over all rows of the image of the considered text line. The third value,
,

is the median distance between two vertical strokes, in the row with

the maximum number of black-white transitions. In other words, sitions occur. Finally the threshold formula:

is equal to the

median length of the white runlengths in the image row where most black-white tranfor line segmentation is given by the following

(3)

where is a scaling constant that needs to be experimentally determined. By deleting connections in the minimal spanning tree which are longer than the threshold , the text line is segmented into single words (see Figs. 5 and 6). For each text line, the threshold is computed seperately.

3 Feature Extraction
The hidden Markov models used for recognition expect as input a sequence of feature vectors for each unknown word to be recognized. To extract such a sequence of feature 5

Figure 5: Image of the convex hulls and the minimal spanning tree for Fig. 3.

Figure 6: Resulting segmentation of the text line in Fig. 3. vectors from a word image, a sliding window is used. A window of one column width and the words height is moved from left to right over the word to be recognized. At each position of the window nine geometrical features are determined. The rst three features are the number of set pixels in the window, the center of gravity, and the second order momentum. This set characterizes the window from the global point of view. It describes how many pixels in which region of the window are, and how they are distributed. For a more detailed description of the window, features four to nine give the position of the upper- and the lowermost contour pixel in the window, the orientation of the upper and lower contour pixel, the number of black-white transitions in vertical direction and the number of black pixels between the upper and lower contour. To compute the orientation of the upper and lower contour, the contour pixels of the neighboring windows to the left and to the right are used. Notice that all these features can be easily computed from the word images. However to make the features robust against different writing styles, careful preprocessing as described in [8] is necessary.

4 Recognition
Hidden Markov Models (HMMs) have been widely used in the eld of handwriting recognition [13]. In systems with a small vocabulary and a sufcient number of train-

ing instances of each word, it is possible to build an HMM for each word. But for large vocabularies with only few training instances per word, not enough information is available for a good parameter estimation. Therefore, in our system an HMM is build for each character. The use of character models allows to share training data. Each instance of a letter in the training set has a direct impact on the training of one of the models and leads to a better parameter estimation. To achieve optimal recognition results, the character HMMs have to be tted to the problem. In particular the number of states, the possible transitions and the output probability distributions have to be chosen. Based on empirical studies it was decided to use 14 states per character model, organized in a linear fashion with continuous output distributions. For training the Baum-Welch algorithm [14], applied on whole text lines, is used1 . In the recognition phase the character models are concatenated to words according to the underlying vocabulary. It is assumed that each feature sequence extracted from an image contains exactly one word. Using the Viterbi algorithm [14] the probability of each word model corresponding to the given feature sequence can be computed. The word model with maximum probability is taken as recognition result. The implementation of the word recognizer is based on the HTK software tool [15].

5 Experiments and Results


All experiments described in the following are based on a subset of 59 forms (c03???[a-f]) from the IAM-database [11]. The dataset contains 541 handwritten text lines with a total of 3899 words instances from a vocabulary of 412 words. The whole database (1066 forms) has an average of 8.59 lines per form and 8.98 words per line. First the preprocessed text lines were segmented into individual words using the method described in Section 2. As ground truth for the database described in [11] is available only on the level of whole text lines, but not on the level of individual words, the segmentation result were manually checked for errors. Two kinds of errors were counted: errors caused by over-segmentation and errors caused by under-segmentation. In case
1 The

training of the HMMs has to be done on whole text lines because no truth values for the individual

word images are available in the IAM-database.

both types of errors occured together (for example, part of a word was merged with part of the precedessor or sucessor word), then both errors were counted. To investigate how the algorithm behaves dependent on the segmentation threshold

, we have varied the scaling constant in eq. (3) from the value of 0.8 to 1.5. The

results obtained are reported in Table 1. segmentation errors

0.8 0.9 1.0 1.1 1.2 1.5

over 6.95% 4.35% 3.62% 2.36% 2.23% 1.39%

under 1.07% 1.09% 1.46% 2.08% 2.77% 6.44%

total 8.02% 5.38% 5.08% 4.44% 5.03% 7.83%

Table 1: Segmentation performance for varied threshold . For

and

a segmentation error rate of 5.08% was measured with 3.62% over- and is decreased, by scaling it with

1.46% undersegmentations. If the threshold


, it can be observed that the number of under-segmentations decrease,

while the number of over-segmentations increase. The opposite behavior can be observed if the threshold is increased. A graphical representation of the segmentation

errors is shown in Figure 7. The best segmentation performance of 4.44% segmentation error was measured for

Compared to the results reported in [9] (9.70%

correct) and [10] (6.68% correct) the performance is slightly better. The differences may be due the different data set used in our experiments. In another experiment, the performance of the word recognizer was measured. For this experiment the word images received under the optimal value of

were used,

and for the recognition it was assumed that each image contains exactly one single word. The set of word images was divided into ve subsets. Each of these subsets was tested with the HMMs trained with the corresponding line images of the four remaining sets. Under these circumstances a word recognition rate of 73.45% was measured. Clearly, part of the recognition errors are due to the over - or undersegmentation errors 8

10.0 Oversegmentation Error Rate Undersegmentation Error Rate Segmentation Error Rate

8.0

Error Rate [%]

6.0

4.0

2.0

0.0 0.8

1.0

1.2 Scaling Factor Alpha

1.4

1.6

Figure 7: Error rate caused by over- and undersegmentation and their sum. committed by the segmentation procedure. If only perfectly segemented words are used for recognition, 77.2% of the words are recognized correctly. Compared to results reported on the same data (78.5% without language model) [16], a lower recognition rate is obtained.

6 Conclusions
In this paper we have presented a recognition system for general handwritten text. The system includes procedures for preprocessing and normalization, the segmentation of whole pages into individual words, feature extraction, and recognition. The normalization of text lines consists of skew and slant correction, positioning and horizontal scaling. Following these preprocessing steps, the text lines are split into individual words. This is done without any knowledge about the semantic contents of the writing. A segmentation method based on the distance of convex hulls and

the minimal spanning tree is used. With this method a minimal segmentation error of 4.44% was obtained. Using the segmented words in an HMM based recognizer, 73.45% of the words were recognized correctly. Writer independent recognition of general handwritten text is a challenging problem that is far from being solved. It can be expected that systems that include multiple independent classiers achieve a better recognition rate than each of the individual classiers involved. Because the recognition procedure described in this paper, which takes individual words as input, is very different from our approach described in [8], which works on complete lines of text, a performance improvement seems possible if both methods are combined with each other. Appropriate classier combination techniques are subject to our future research.

References
[1] C.Y. Suen, C. Nadal, R. Legault, T.A. Mai, and L. Lam. Computer recognition of unconstrained handwritten numerals. Special Issue of Proc. of the IEEE, 7(80):11621180, 1992. [2] J.-C. Simon. Off-line cursive word recognition. Special Issue of Proc. of the IEEE, 80(7):11501161, July 1992. [3] G. Kaufmann and H. Bunke. Automated reading of cheque amounts. Pattern Analysis and Application, 3(2):132141, 2000. [4] A. Kaltenmeier, T. Caesar, J.M. Gloger, and E. Mandler. Sophisticated topology of hidden Markov models for cursive script recognition. In Proc. of the 2nd Int. Conf. on Document Analysis and Recognition, Tsukuba Science City, Japan, pages 139142, 1993. [5] N. Gorski, V. Anisimov, E. Augustin, D. Price, and J.-C. Simon. A2ia check reader: A family of bank check recognition systems. In 5th Int. Conference on Document Analysis and Recognition 99, Bangalore, India, pages 523526, 1999.

10

[6] B. Lazzerini, F. Marcelloni, and L.M. Reyneri. Beatrix: A self-learning system for off-line recognition of handwritten texts. Pattern Recognition Letters, 18(6):583594, June 1997. [7] G. Kim, V. Govindaraju, and S.N. Srihari. Architecture for handwritten text recognition systems. In S.-W. Lee, editor, Advances in Handwriting Recognition, pages 163172. World Scientic Publ. Co., 1999. [8] U. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. To appear in Int. Journal of Pattern Recognition and Articial Intelligence, 2001. [9] G. Seni and E. Cohen. External word segmentation of off-line handwritten text lines. Pattern Recognition, 27(1):4152, January 1994. [10] U. Mahadevan and R.C. Nagabushnam. Gap metrics for word separation in handwritten lines. In Proc. of the 3rd Int. Conf. on Document Analysis and Recognition, Montr al, Canada, volume 1, pages 124127, 1995. e [11] U.-V. Marti and H. Bunke. A full English sentence database for off-line handwriting recognition. In Proc. of the 5th Int. Conf. on Document Analysis and Recognition, Bangalore, India, pages 705708, 1999. [12] A. Drozdek. Data Structures and Algorithms in C++. PWS Pub. Co., 1996. [13] A. Kundu. Handwritten word recognition using hidden Markov model. In H. Bunke and P.S.P. Wang, editors, Handbook of Character Recognition and Document Image Analysis, chapter 6, pages 157182. World Scientic Publ. Co., 1997. [14] L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993. [15] S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland. The HTK Book. Entropic, 1999. [16] U. Marti. Ofine Erkennung handgeschriebener Texte. PhD thesis, University of Bern, Switzerland, 2000.

11

You might also like