Professional Documents
Culture Documents
By
Zaheer Ahmad
MS-IT
Institute of Management Science, Peshawar, Pakistan
6th February,09
1
Optical Character Recognition
• Optical Character Recognition (OCR) is the
mechanical or electronic translation / reading
of images of handwritten, typewritten or
printed text (usually captured by a scanner)
into machine- editable text
• OCR is Branch of
– Pattern Recognition and Machine Vision
2
Optical Character Recognition
Main Three Steps
3
OCR History
• Research commenced in 1950s
• In the 1960s and 1970s, new OCR applications
developed for businesses, banks, hospitals, post
offices; insurance, railroad, and aircraft
companies; newspaper publishers
• In (1960s), OCR machines tended to make lots of
errors when the print quality was poor, caused
either by wide variations in type fonts and
roughness of the surface of the paper or by the
cotton ribbons of the typewriters
4
• To reduce errors: Standardization of print
fonts, paper, and ink qualities
• In the 1970s New fonts such as OCRA and
OCRB were designed
• These efforts revolutionalized data entry
process ….…..loss of jobs
5
Urdu and Arabic Script OCR History
• Little work in the field, mostly on standalone
alphabets…India….and Pak
BUT
• Some work on Arabic and Farsian but still……..
• Different style used for Arabic, Farsian and
Urdu.i.e nastaleeq ….naskh….
• Some work on Pashto
6
Some Applications Of Urdu OCR
• It will expand and multiply already available
knowledge in hard copies i.e.
– Centuries old rare script in Arabic, Urdu and
Persian will become available to common man
– improve the interaction between man and
machine in many applications, including
• office automation,
• check verification of banking,
• business and data entry applications,
7
Some Applications Of OCR
• library archives,
• documents identifications,
• e-books producing,
• invoice and shipping receipt processing,
• subscription collections,
• questionnaires processing,
• exam papers processing and
• online address and signboard reading
8
Pattern Recognition
• The act of taking in raw data and taking an action
based on the category of the data (also known as
classification or pattern classification)
• It uses methods from statistics, machine learning
and other areas.
• Some popular techniques for pattern recognition
include:
• Neural Networks
• Hidden Markov Models ----Probability
• Bayesian networks ….
9
Urdu Script ( )اردو رسم الخط
• National Language of Pakistan and
• One of the popular script in the Indian
subcontinent evolved in the subcontinent from
the mixture of Arabic, Turkish, Farsi and Hindi
Languages
• spoken by more than 60 million speakers in over
20 countries
• 58 Character Set by NLA Pakistan
• 40 Basic plus one do-chashmi-hey ( ) ھis used to
form all composite alphabets; therefore the
working set is consists of 41 alphabets.
10
Character Set (58 alphabets) of Urdu Script
11
• Urdu is a modification of the Persian alphabet,
which is itself a derivative of the Arabic
alphabet
• and adopted some characters like Rhe( ) ڑfrom
Hindi script.
• Urdu is a right to left Script written in the
calligraphic Nasta'liq script where as Naskh
style is used for Arabic.
12
• Most combined Characters form a degree of
about 45 to the horizontal line.
• because of which Urdu script reading is faster
than roman script but
• It makes it harder the machines to recognize
the word or segment one character from the
rest.
• for the novice readers as well
13
• No Capital or Small Character but the Last character is
considered to be capital as it is in its full form.
• Stand alone and joining forms ----changes shape but also
its size .
• It increase the number of classes to be recognized from. In
our experiments we have used 54 different classes for 41
different Urdu characters
ث ثثث sē
ج ججج jīm
16
• Some characters contain closed loop the
Character ھcontains two loops.
• The open portion of characters Jim ـﺝ, Hey
ـﺣand Khe ـﺥforms a triangle.
• The loop of character Mem م,Waw وand Ein ـﻌـ
sometimes becomes too small that the internal
opening part is disappeared
• Hamza ( )ءzigzag shape, is not really a letter but it
can cause difficulty in segmentation process as it
resembles with the segmented middle form of
ein ( ) ع.
17
• Dots may appear as two separated dots,
touched dots, hat or as a stroke.
• Another style of Urdu handwriting is the
artistic or decorative calligraphy.
• Which is usually full of overlapping making
the recognition process even more difficult by
human being rather than by computers
18
What Have Been Done
--------------------NOTHING------------
• No text databases or dictionary available, except
the one under preparation by the Urdu Language
Authority but their Web shows a slow progress so
far.
• Even no standard keyboard exits, National
Language Authority of Pakistan has devised a
keyboard in which the most used characters are
set under the main fingers but it is very different
from the one already in use ( phonetic keyboard
of Inpage).
19
• Moreover still to be adopted by software vendors
as even Windows Vista is using its own version of
Urdu keyboard.
• The research carried out on Urdu language is
mostly scattered and outside from the Urdu
world.
• There are no specialized conferences or
symposium conducted so far.
• There is no financial support from government.
20
Neural Networks
21
NN A Brain-Inspired Model
Inputs Outputs
out
in 22
NN A Brain-Inspired Model
• A neural network acquires knowledge through
learning.
• A neural network's knowledge is stored within
inter-neuron connection strengths known as
synaptic weights.
• The largest modern neural networks
achieve the complexity comparable to a
nervous system of a fly.
23
Historical Background
• 1943 McCulloch and Pitts proposed the first
computational models of neuron.
• 1949 Hebb proposed the first learning rule.
• 1958 Rosenblatt’s work in perceptrons.
• 1969 Minsky and Papert’s exposed limitation of the
theory.
• 1970s Decade of dormancy for neural networks.
• 1980-90s Neural network return (self-organization,
back-propagation algorithms, etc)
24
NN Applications
• Process Modeling and Control- Creating a neural network model for a physical
plant then using that model to determine the best control settings for the plant.
• Machine Diagnosis- Detect when a machine has failed so that the system can
automatically shut down the machine when this occurs.
• Target Recognition- Military application which uses video and/or infrared image data to
determine if an enemy target is present.
• Medical Diagnosis- Assisting doctors with their diagnosis by analyzing the reported
symptoms and/or image data such as MRIs or X-rays.
• Target Marketing- Finding the set of demographics which have the highest response
rate for a particular marketing campaign.
• Voice Recogntion- Transcribing spoken words into ASCII text.
• Financial Forecasting( Stock predication) - Using the historical data of a security to
predict the future movement of that security.
• Quality Control - Attaching a camera or sensor to the end of a production process to
automatically inspect for defects.
• Intelligent Search - An internet search engine that provides the most relevant content
and banner ads based on the users' past behavior.
• Fraud Detection - Detect fraudulent credit card transactions and automatically decline
the charge. 25
How NN Work ( Mathematically)
• Linear and Non Linear Pattern / Classification
• Regression / Function Estimation
• Curve Fitting
Why to USE NN
• Parallel Processing
• Fault tolerance
• Self-organization
• Generalization ability
• Continuous adaptivity
26
Artificial Neurons
• Neural networks are made up of nodes which have
– Input edges, each with some weight
– Output edges (with weights)
– An activation level (a function of the inputs)
• Weights of edges can be positive or negative and may change
over time (learning)
• The output function is the weighted sum of the activation levels
of inputs
• The activation level is a linear or non-linear transfer function “a”
of the input :
• Some nodes are inputs, some are outputs.
27
A Model of Artificial Neuron
x1
wi1
x2
wi2 yi
.
. f (.) a (.)
.
wim =i
xm= 1
bias
28
A Model of Artificial Neuron
x1 yi (t 1) a( f )
wi1
x2
mw yi
.
f ( i ) w x
i2
ij j
. j 1 f (.) a (.)
.
wim =i 1 f 0
a( f )
xm= 1 0 otherwise
bias
29
Structural Types of NN
• Un-weighted -- McCulloch–Pitts ( 1943 )
• Weighted---- Introduced by Hebb
• Supervised
• Perceptron -- by Frank Rosenblatt—foundation
• ADALIN and MADALIN
• FFNN
• Unsupervised
• ART1 and ART2
• Kohenon’s Self Organizing Maps(SOM)..etc
30
The Perceptron
Bias
x1 w1 xn+1=-1 =wn+1
wn+1
w2
x2 y
. a= bias+w x
. wn
{ 0 if a <0
i i
1 if a 0
x.
n
y=
then they are said to be linearly separable. The simple network can
correctly classify any patterns.
• Decision boundary of linearly separable classes can be determined
either by some learning procedures or by solving linear equation
systems based on representative patterns of each classes
• If such a decision boundary does not exist, then the two classes are
said to be linearly inseparable.
• Linearly inseparable problems cannot be solved by the simple
network , more sophisticated architecture is needed.
32
• Examples of linearly separable classes
- Logical AND function o x
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1
-1 1 -1 b = -1 o o
1 -1 -1 =0
1 1 1 -1 + x1 + x2 = 0 x: class I (y = 1)
o: class II (y = -1)
Equation of Line ( Decision Boundary )
x x
- Logical OR function
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1 o x
-1 1 1 b=1
1 -1 1 =0 x: class I (y = 1)
1 1 1 1 + x1 + x2 = 0 o: class II (y = -1)
33
• Examples of linearly inseparable classes
x o
- Logical XOR (exclusive OR) function
patterns (bipolar) decision boundary
x1 x2 y
-1 -1 -1 o x
-1 1 1
1 -1 1 x: class I (y = 1)
1 1 -1 o: class II (y = -1)
34
Multilayer NN
• Neural Net for Nonlinear Classification
• Combination of Perceptron
• Back propagation learning
35
Multilayer FFNN
A NN with one or more than one hidden layers
What do each of the layers do?
1st layer draws 2nd layer combines 3rd layer can generate
linear boundaries the boundaries arbitrarily complex boundaries
36
Back propagation Algorithm
• Multiple outputs.
• Forward pass:
• Error calculation:
• Backward propagation:
• No guarantee to in getting best possible
weights after correcting.
• Classifies inputs into multiple classes.
• Can be modified to represent any function.
MATLAB and NN Toolbox
• The name MATLAB stands for matrix
laboratory.MATLAB is a high-performance language for
technical computing. It integrates computation,
visualization, and programming in an easy-to-use
environment where problems and solutions are
expressed in familiar mathematical notation. Typical
uses include:
• Math and computation
• Algorithm development
• Modeling, simulation, and prototyping
• Data analysis, exploration, and visualization
• Scientific and engineering graphics
• Application development, including Graphical User
Interface building
38
Urdu Optical Character
Recognition Using Feedforward
Neural Networks
text image.
• Segmentation Part
• Neural Network /
Classification Part
41
i ii iii iv v vi vii viii
i 0 0 0 0 1 0 1 0
ii 0 0 1 0 0 0 1 0
iii 0 1 1 1 1 0 1 0
iv 0 1 0 1 0 0 1 0
v 0 1 0 1 0 1 1 0
vi 0 0 0 1 0 1 1 0
vii 0 0 1 1 0 1 1 0
viii 0 0 1 1 0 1 1 0
Ix 0 0 1 1 0 1 1 0 42
• Energy Level
– For Words segmentation zero level energy is
selected
– For character segmentation energy of the seam of
is calculated and compared with the average
energy of all the seam(of the image )
• Character Size and threshold values
• Large Segments further segmented
• Small Segments are merged
43
Garbage Characters
the main problem in the algorithm
44
Training Patterns
• Ariel font of size 36 was
resized
– Enlarged in some cases ‘imresize’ function with
– reduced in some case ‘nearest’ parameter
• 54 different classes
• 100 samples
• Ms Paint, photoshop and
Ms Word
45
Neural Network Training and Sim
• A Multilayer Feedforward Neural Network(FFNN)
• Input layer21x15 (315)----size of character
• Hidden layer with 2000------trial and error
• Output layer 6 nodes ----to cover 56 Alphabets
• Activation Functions Tansig and logsig
• epochs = 2000 to get trained/meet the goal of
0.0005----goal selected from results
• Time 5-7 hours on 2 GHZ, Dual Core with 2 Gb of
RAM
46
• Segmented character again resized before
feeding to NN
• sim’ function returns a 6 digit binary number
• The number is matched with the 54 character
set (used as target during the training).
50
Conclusions
• The results show 70% success at the neural network output
but the algorithm developed shows about 85% results when
seen through human eye.
• Most of the error (garbage characters) are produced at the
end character of a word when the word is ending on noon or
a character having similar shape like noon.
• It is hard to find which character is the end character
therefore the problem cannot be overcome easily.
• A large percentage of error is produced by the character
seen( )س,sheen( )ش,swad()ص,dwad()ض, noon()ن, noon
ghuna( )ںwhich in most of the cases get passes the character
test during segmentation,
• where as bee () ب, pee ()پ, tee ( ) ت, tay ( )ٹ, cee ( )ثand
fee ( )فalso produces garbage characters in some cases.
51
Thanks
52
53
x0 1
xn
a1 a1
54
55
Binary
AND OR NOT
56
AND
y = 0.5
w1 w2
x1 x2
1
x1 2 z1 2 0
-2
Y
-2
x2 z2 2
2
61
Linear Separation
Linear Logistic
Discriminant Regression
1
Y = a(X) + b Y=
1 + e-a(X) + b
AND, OR, NOT
x1 1.0
w1
1.0
x2 w2
wn Integrate Threshold
xn 1.5
AND, OR, NOT
x1 1.0
w1
1.0
x2 w2
wn Integrate Threshold
xn .9
AND, OR, NOT
x1 -1.0
w1
x2 w2
wn Integrate Threshold
xn .5
67
68
Perceptron Learning Algorithm:
Initialise weights and threshold.
Set wi(t), (0 <= i <= n), to be the weight i at Perceptron Learning Algorithm:
time t, and ø to be the threshold value in the
output node. Set w0 to be -ø, the bias, and x0 start: The weight vector w0 is
to be always 1. generated randomly,
Set wi(0) to small random values, thus set t := 0
initialising the weights and threshold. test: A vector x 2 P [ N is selected
randomly,
if x 2 P and wt · x > 0 go to test,
Present input and desired output if x 2 P and wt · x 0 go to add,
Present input x0, x1, x2, ..., xn and desired if x 2 N and wt · x < 0 go to test,
output d(t) if x 2 N and wt · x 0 go to subtract.
Calculate the actual output add: set wt+1 = wt + x and t := t +
y(t) = fh[w0(t)x0(t) + w1(t)x1(t) + .... + wn(t)xn(t)] 1, goto test
Adapts weights subtract: set wt+1 = wt − x and t :=
wi(t+1) = wi(t) + ñ[d(t) - y(t)]xi(t) , where 0 <= ñ t + 1, goto test
<= 1 is a positive gain function that controls
the adaption rate.
Steps iii. and iv. are repeated until the iteration
error is less than a user-specified error
threshold or a predetermined number of
iterations have been completed. 69
Neural Networks –
Training
Backpropagation training cycle
70
71