You are on page 1of 39

ET-Tahir ZEMOURI , Youcef CHIBANI and Youcef BRIK

{tzemouri ; ychibani; ybrik}@usthb.dz



Faculty of Electronic and Computer Science
University of Science and Technology Houari Boumediene
Algiers, Algeria
Combined Binarization Approach
for Historical Arabic Document Image
1st Conference on Theoretical and Applicative Aspects of Computer Science
CTAACS'12 November 25-26, 2012


Outline
1. Introduction
2. Binarization
3. Proposed method
4. Binarization
5. Experimental results
6. Conclusion
Context
In recent years, the document analysis and
recognition community has shown increasing
interest in the processing of historical
documents. These old documents often have
historical and cultural significance and the aim
is to scan them and create digital libraries.

The challenge is to create automatic search
engines that allow the users to find and
retrieve only the documents with the relevant
content from the entire collection.
Binarization ?
T. ZEMOURI Y. CHIBANI and Y.BRIK 1 CTAACS 2012
Gray-level image T=60
T=128 T=200
Related works
List of references
1. Otsu, N.: A threshold selection method from gray level histogram. IEEE Transactions on System, Man,
Cybernetics, vol. 9, no. 1, pp. 62-66 (1979)
2. Ridler, T.W., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Transactions on
Systems. Man and Cybernetics, vol. 8, no. 8, pp. 630-632 (1978)
Otsus method [1] calculate T in such a way as to minimize the variance
between the two distributions.

Isodata [2] threshold by separating iteratively the gray-level histogram into
two classes.
1- divide the interval of non-null values into two equidistant parts
2- take m
1
and m
2
as the arithmetic means of each class.
3- Repeat until convergence, the calculation of the threshold T as the
closest integer to (m
1
+m
2
)/2 and update the two means m
1
and m
2
.
Related works
List of references
1. Niblack. W. An Introduction to Digital Image Processing. Englewood Cliffs. New Jersey: Prentice-Hall. 1986.
2. Sauvola. J and Pietikainen. M. Adaptive document image binarization. Pattern Recognition. vol. 33. n. 2. pp.
225236. 2000.
3. Khurshid. K. Siddiqi. I. Faure. C and Vincent. N. Comparison of Niblack inspired Binarization methods for
ancient documents. 16th International conference on Document Recognition and Retrieval. vol. 7247. pp.
0U1-0U9. 2009.
Niblacks method [1] : T = m+ k.
m: mean value , : standard deviation and k = -0.2

Sauvolas method [2] : T = m. (1 + k. (

R
1))
k = 0.5 and R = 128

NICK method [3] : T = m+ k.
( P
i
2
m
2
)
N

k : [-0.2, -0.1], P
i
: gray-level value of the pixel, and N : number of pixels
Global thresholding (T=mean(I) )
I
int
x, y =
255 if I x, y > T
I(x, y) otherwise

Local thresholding (Sauvola)
Binarized image
Binarized image
The proposed method
Fig. Histogram of the document image. thresholds
extracted with Otsus method, Isodatas method and
the average value of the pixels.
Test and results
Datasets
Degraded samples from the National Bibliotheca (BN Algiers).
book 1842 - -
116 Arabic printed pages
Fig. Representative sample of database
Test and results
Evaluation system
Preprocessing
Database
Page separation
Binarisation
Deskew
Border removal
Segmentation
Test and results
Evaluation system
Preprocessing
Database
Page separation
Binarisation
Deskew
Border removal
Segmentation
Feature
Generation
Projection prof
Upper profile
Lower profile
Nbr vertical pixel
transition
White/Black
Fig. An original word image and features used in word image matching
(a) Original word image
(b) Vertical projection
(c)Upper profile
(d) Lower profile
(a) Number of vertical transition of pixels white /black
Test and results
Evaluation system
Preprocessing
Database
Page separation
Binarisation
Deskew
Border removal
Segmentation
Feature
Generation
DTW
Projection prof
Upper word prof
Lower word prof
Nbr vertical pixel
transition
White/Black
Projection prof
Upper word prof
Lower word prof
Nbr vertical pixel
transition
White/Black
Precision + Recall
Precision x Recall x 2
= (%) measure - F
N
C * 100
= (%) Recall
M
C * 100
= (%) Precision
where
and
(a) Original image
Fig. Binarization results of document image
(b) Otsu
(c) Isodata
(d) Niblack (e) Sauvola
(f) NICK (g) Proposed
Table. The Evaluation Measures
Objective
Discrimination between the machine printed and handwritten text
Results
Encouraging results by combining Radon energy and statistical
features using SVM classifiers with the RBF kernel
Future works
Distinguish machine printed/handwritten with Arabic and Latin texts

Thank you

You might also like