Professional Documents
Culture Documents
RESEARCH ARTICLE
0 OPEN ACCESS
ABSTRACT
This paper is about presenting presents a technique for recognition of Urdu words in Nastaliq font using ligatures as units of
recognition. In Nastalique, word and character overlapping makes optical recognition more complex. Optical character
recognition of the Latin script is relatively easier. This paper based on research on Nastalique OCR discusses a proposed finite
state model for the optical recognition of Nastalique printed text.
Keywords:- Pattern Recognition, Optical Character Recognition, Urdu Text, Ligatures.
I. INTRODUCTION
Alphabet
Urdu character set is comprised of 38 basic shapes.
It does not include Aerab (Diacritics used for
pronunciation and vowel sounds). This alphabet is shown in
figure 1.