You are on page 1of 72

Graduation project

Problem description:

To imagine the problem from a simple view, lets assume we have a gate for parking at company , airport or high secure area, and we dont want to let any unauthorized cars or drivers to get inside those places. The first step is to build a database that contains cars and drivers who have the right of access to this place And the problem comes in the second step, which is how to check every car and driver at the gate and use the database to give them the access or prevent them from entering the place

Problem description(cont.):
Here you have two ways
The automated way, by developing a real time system which can recognize car plate number and driver and give the gate order to open or close without any human interference The manual way , by hiring a person to be in charge of the gate , and check manually each car plate number and each driver before letting them to access

And here comes our project to develop the software program that can perform the functions of the automated system and it is the first step of applying it as a real time system that can work at real world.

Problem description (cont.)

Problem description (cont.)


As we can see from the last figure which represent the block diagram of the system, we can divide the system to four main phases:
Face detection Face recognition
plate number localization & characters segmentation

Optical character recognition

First Phase

Why are we using Face Detection?


The next phase after the face detection is the face recognition which performs the comparison between the drivers image and a stored database to identify and recognize him . But to work well, face recognition algorithms require the image to be detected from the original image and isolated from the background. here comes the role of the face detection system to take the original image from the camera and then isolate the face of the driver only without any background and so the new isolated image can be entered to the face recognition system.

Why our face detection work is different and special from other types of face detection?
The answer comes from our dealing with persons and drivers and faces behind glass of cars not faces in normal conditions, and we want to detect faces of those drivers. And this will divide our experimental work (as we will see later) to two steps: Testing of algorithms on normal images(without glass) Taking successful algorithms at the first step and testing them again but with new images(with glass).

The used algorithms till now in this field can be categorized to three main categories: i. Techniques based on information theory: and here they treat the images from the view of mathematics and try to deal with it by some analytical and mathematical methods to reach our target and detect the face. Ex: eigenfaces technique, fisherfaces technique

Background for face detection features and techniques:

ii.

Background for face detection) Features and techniquessegmentation in other cont.(: Techniques based on skin

color spaces than RGB: here the idea is that human skin has fixed remarkable ranges in some color spaces. so we can identify the area of the face from this idea. Ex: HSV space, YCrCb colorspace iii. Algorithm of building skin model then template matching: At this algorithm different small skin samples are used to build a gaussian distribution of the skin and then scan the image to indicate areas inside this distribution(candidate areas), then using template matching with average face we can choose the best area to be the face.

First Step )Experimental Work):


We will begin here with our experiments on the original images (without glass) and study the algorithms which can be applied and then we will take the most successful two or three of them with high accuracy and convenient execution time to the next step which we will talk about later. First experiment: we started our work by trying the algorithm of skin detection which is based on the idea of converting the RGB image to another domain which is HSV domain and then converting it to Ycrcb domain but why? as the Ycrcb has a very important feature about skin which is skin has a specified range in this domain, so if we scan the image pixel by pixel and for each pixel we studied if it is located inside skin range or not we can finally indicate the skin spots in the image and of course they will include the face. The range is: (0.01<=H,0.1>=H),(40<=cr,165>=cr),(140<=cb,195>=cb)

First Step )Experimental Work Cont.):


Due to lack of experience at this time (the beginning of the project) we couldnt apply this algorithm correctly and we have made some mistakes inside it which leaded to a very bad accuracy (under 50%). Second experiment: After that we tried another idea which is template matching alone, and this algorithm depends on building a template face (average face), and by making a correlation operation during scanning the whole image and taking the point which has the highest result of the correlation operation to be the center of the detected face. But some problems faced this algorithm, at first this algorithm took very long time in execution in matlab (about 1 minute), another problem that accuracy was also very bad (in the range of 50%), all this leaded us to use another algorithm.

First Step )Experimental Work Cont.):


Third experiment: After reading some papers we found that template matching is not efficient method to be used alone in this field, but it is used side by side with another idea which is used to get candidate areas, and then use template matching to choose the right area from those candidates. This new idea is to build skin model (range), by taking fixed small samples of different skins like white skin, yellow skin and black and then converting the RGB samples to HSV domain and then to Ycrcb domain and then the three components of all samples is used to build 3D Gaussian distribution by calculating mean and variance of each component, then we can use this distribution to indicate the range of the skin. This step is fixed and done before each detection trial. Then using scanning pixel by pixel to indicate candidate areas to correlate them using template matching to choose the best area to segment The results of this algorithm were good with high accuracy (between 80% and 90%) and with convenient time of execution. So we considered this algorithm a successful one that can pass to next step.

First Step )Experimental Work Cont.):


Fourth experiment: We went back to the skin detection algorithm with fixed range, and tried to fix it and improve its accuracy. And our trials succeeded to find some problems in our first work in this algorithm and we fixed it after reading this algorithm and studying it well. We found some fatal mistakes in the method we applied the algorithm to the software which made the results very bad in the fist trial. After that we reached accuracy of about 98% with a convenient execution time so we considered it also a successful algorithm which had the right to pass to the next step.

Experimental work (second step)


Now at the second part, we will discuss the effect of using new images with the interference of car's glass on the algorithms which showed great success on normal images, and study if their performance will be affected or not. This experiments will be discussed in the following points: i. By entering the new images which contain faces behind glass of car we found that the two algorithms dramatically failed. So the effect of the glass and the light reflections from it destroys the two algorithms. So we had to study the two algorithms and analyze the causes and reasons of this terrible fail in their performance ii. After the study of the skin detection algorithm, we found that the glass of the car has totally changed this fixed range of skin in the Ycrcb domain, so there won't be any image to work accurately in the normal range

Experimental work (second step Cont.):


iii.

iv.

After studying the second algorithm of building the model and template matching, we found that the problem came at the phase of building the model of the skin because the small samples of skin that were used to build the distribution are normal skin, but we are using this distribution to detect an abnormal face (with glass). At this time we found ourselves in a big problem and approximately returned to the zero point as all our algorithms have failed, and we thought at this time that the only solution here is to modify those algorithms with new ideas of our own thinking to meet and face the new element of glass. We took a period of time in thinking and making experiments to try to solve the problems of our algorithms

Experimental work (second step Cont.):


First here we will talk about how we solved the problems of the skin model and template matching algorithm:

The direct idea which came here to solve the problem of building model algorithm was to generate new skin samples of skin behind the glass to replace the old skin samples which were in the normal conditions without glass. Then we generated new skin samples from the new images, and we built our new 3D distribution of the new images (with glass) to detect a face of the same type of images, so the model and the faces had a similarity and this made sense for us. The template matching part didnt have problems after that. The problem has been solved and the results were very good with accuracy of about 92-93% (compared with the zero results before solving the problem)but there was a small problem that we had to collect a bigger number of skin samples than the normal skin samples and the execution time was a little bit long. So this directed us to trying to solve the other algorithm in a trial of having better performance.

Experimental work (second step Cont.):

Then here we will talk about how we solved the problems of the skin detection algorithm: At the beginning we tried to modify the fixed range of skin in the Ycrcb domain, but it didnt work as if we draw the distribution of this new range we won't find it precise as the old range but its variance is very large which mixed it with the range of the background so it is not possible to identify this range. The solution came from new idea which was trying to isolate and remove the layer of the glass to use the old range normally as the original images. This idea was a great success and leaded to reaching accuracy of 98-99% with short execution time. And finally we took our decision to use the skin detection algorithm after adding the idea of removing the glass layer to be our technique.

Summary of experimental work :


First Step:
Algorithm Skin detection (first trial) Accuracy Under 50% Execution time long Passed or not Not passed

Template matching alone


Building skin model and template matching Skin detection (second trial)

In the range of 50%


Between 80% and 90%

long
convenient

Not passed
passed

98%

convenient

passed

Summary of experimental work:


Second Step before solutions:
Algorithm Skin detection Building skin model and template matching accuracy failed failed Passed or not Not passed Not passed

Summary of experimental work:


Second Step after solutions:
algorithm Skin detection
Building skin model and template matching

accuracy 98-99%
92-93%

Execution time long


short

Taken or not taken


Not taken

Concolusion:
After studying the main algorithms and trying them, first on the normal images (without glass), then with new images (with glass), we found that best results come always from those algorithms that work with skin detection idea not the template matching ones. So we took our decision to use the skin detection algorithm after adding the idea of the removing the glass layer.

(second phase)

Introduction:

The face is primary focus of attention in social life. Playing a big role in conveying identity and emotion. Human ability to recognize faces is remarkable. Without any doubt human brain is the greatest face recognition system on earth. The computational approach taken in this system is motivated by information theory which is called 'eigenfaces', as well as by the practical requirements of near-real-time performance and accuracy. We used also principle component analysis as a method of dimensionality reduction of the algorithm.

Why is this system different from other face recognition systems? system will be tested The different point her is that
and trained and used for new images as we said in face detection phase- with glass, so we will study its performance and decide if it has acceptable results or not. We searched a lot for a database that meets our requirements of images with glass so that we can train our system and test with those new images which he will deal with in practical life, but we didnt find any already made database as we wanted. So we took our decision to design the database with our own to get our specifications.

Experimental work:
implementing of eigenfaces algorithm:
i.
ii.

Obtain face images I1, I2,..., IM (training faces) as matrices. Represent every image Ii as a vector i .

Experimental work:
implementing of eigenfaces one single iii. Put all those vectors beside each other to form matrix(A) (N x M) which is algorithm:face vector() called face space, then calculate the average of this matrix as follows:
2

=1/M i=1M i iv. Subtract the mean face from each vector (A): i =i Where q= [1 2 . M] (N2 x N2) v. Compute the eigenvectors ui and eigenvalues li of (q) vi. order the eigenvalues downwards from max. eigenvalue and also the eigenvectors matrix (e) also will be ordered, so the most of the weight of the data is concentrated in the higher values of the eigenvalues and we will reduce the size of the eigenvectors matrix according to a specific num according required efficiency. vii. now we will project (q) matrix on reduced eigenvalues matrix F = (e)T*q

Classification:
i.

i.

iii.

Now, we have input image coming from the face detection phase and we want to recognize it by comparing with the face space. so, we will convert it to (f0) vector also. We will subtract this (f0) from the mean face vector and project it on the reduced eigenvectors matrix also. Test_f=(e)T*f0 The next step is to calculate the place of the column from the face space which has the minimum equlidean distance with (Test_f) and this column indicates to an image, and this image will be our decision.

Why did we use PCA?


A question may arise now and it will be about the usage and the benefit of the principle component analysis. Principle components analysis aims to catch the total variations in the set of the training faces, and to explain the variations by a few variables. In fact, observation described by a few variables is easier to understand than it was defined by a huge amount of variables and when many faces have to be recognized the dimensionality reduction is important. And this of course decrease a huge number of mathematical operations which dont give any extra information, which will save more execution time and decrease the difficulty of the system.

Concolusion:
After studying the results of the system, which gave accuracy of about 99% at our limited database , we found that we have a very good and acceptable performance. The performance of the system wasnt affected by using the new images (with glass)and gave approximately the same accuracy that it gave with the original images (without glass).

Third phase

Introduction:

we will talk about a very important step in this project which is the localization of the vehicle plate number and then the segmentation of it after that we can segment the characters written on the plate to introduce them to the next step of the project which is the optical character recognition (O.C.R.), so that the automatic system can recognize those characters to compare them with the stored database of the plates which have the right to access the gate.

Background of previous work:


1. Smearing algorithm. Smearing is a method for the extraction of text areas on a mixed image . A. convert image to binary B. With the smearing algorithm, the image is processed along vertical and horizontal runs (scan-lines). C. If number of white pixels < 10 ; pixels become black. Else ; no change If number of white pixels > 100 ; pixels become black Else ; no change 2. Image correlation algorithm. Main idea of correlation algorithm is to get image represent the plate and then correlate it with the test image and choose high correlated.

3.Horizontal and vertical edge processing algorithm:


Color to gray conversion dilation

Horizontal edge processing Vertical edge processing segmentation

Region of interest extraction

Output image

Experimental Work:
1-smearing algorithm:
A. convert color image to gray image. B. convert gray image to binary image. C. put thresholds and process image along vertical and horizontal runs. D. dilation and then erosion to keep only useful area.

Results: We reached to accuracy of 20%. problems:


i . Sun light problem. ii . Changing thresholds is useless.

2. Image correlation algorithm:


A. convert color image to gray image. B. convert gray image to binary image. C. searched about same object repeated in all plate license to correlate with it. We found that this word ( )in plate license so we tried to correlate this word to get plate of license plate but we got nothing.

Results: We reached accuracy of 10%. problems:


Problem here is that we convert image to binary and in most of cases this word became black so algorithm failed to get required area.

3. Horizontal and vertical edge processing algorithm:


A. First Experiment: A. Convert RGB to Gray. B. Insert gray image to function called < LPCROP>. this function not built in function but it is made by us. It depend on edge filter and search for plate by size of it. input : gray image Output: a smaller image that only holds the license plate.

Gray image:

Cropped image:

C. Convert resultant image to binary image and use gray threshold in converting. D. Calculate number of object in last image then make a loop by this number and apply some conditions on area and height of extracted object . then we get numbers and characters of license plate.

Results: We reached accuracy of 50%. problems:


i. Sun light problem. ii. Converting to binary threshold may delete some numbers. iii. Execution time was long.

B. second experiment:
A. Convert RGB to Gray. B. enhances the contrast of output image ( gray). C. scanning vertically to get maximum edge density. D. get positions about right and left standard deviation value. E. use last positions value to crop plate license from image. F. convert cropped image to binary and extract its letters same as steps 3,4 in first experiment.

Results: We reached an accuracy of problems:

60%

Same as first experiment but we got some progress because of using standard deviation.

C. Third Experiment: A. Step 1,2,3,4,5:same as second experiment. B. Using position of 2 points around standard deviation crop license plate and some area above it and below it because if we have error in detection of license plate , we always get the plate. C. Extract the plate letters & numbers from cropped image. Results: We reached here to accuracy of 72%. problems: converting to binary threshold problem.

D. Fourth Experiment:
A. Repeating the same steps as in 3rd Experiment until cropping the plate area. B. Converting to different binary thresholds (0.25,0.35,0.5,0.65,0.75,0.85). C. The choice between thresholds based on which image contains more white pixels and had 6 or 8 objects as the number of letters & numbers of plate. D. Extract letters and numbers. Results: Here we reached to accuracy of 83%.

Disadvantages:
The choice of the best binary threshold.

Converting binary image with different thresholds:

E. Fifth Experiment:
A. same as first steps in fourth experiment. B. we noticed that most of characters and numbers centered at middle of image. The choice between thresholds based on which image contains more objects centered at the middle of image.

Results: Here we reached accuracy of 70%.


problems:

accuracy decreased because this choice between threshold cause a problem.

F. Sixth Experiment:
A. repeating the same steps as in 3rd Experiment until cropping the plate area. B. We noticed that the best threshold to convert to binary was around the global threshold. C. The choice between thresholds based on which image contains more white pixels and had 6 or 8 objects as the number of letters & numbers of plate. D. after choosing the best threshold ,then extract letters & numbers.

Results: Here we reached an accuracy of 90%. problems:


still the choice of the best threshold cause problem.

7. Seventh Experiment:
A. Repeating the same steps as in 3rd Experiment until cropping the plate area. B. The choice between thresholds in this experiment depend on which image after converting to binary containing large number of white pixels. But the difference here is we make more than one pattern. Every pattern contains more than one threshold. We choose pattern first then choose threshold. C. After choosing the best threshold ,then extract characters & numbers. Results: We reached an accuracy of 99%.

problems:
In 99% of our pictures characters & numbers was extracted but with some unwanted objects.

Removing Unwanted Objects:


Method
White Pixels & black pixels

Accuracy
<10%

Sort diff. between objects centroid & image Centroid (take minimum 6)
Using NN

~ 60%

~80%

Cropping the Plate area only & Using NN


Object Centroids difference & sorting them (take minimum 6)

~85%
~90%

Scanning row by row from above & below (stop when finding 6 objects)
Trapping the plate by a rectangle to extract it only

~90%

~95%

Optical Character Recognition


Fourth phase

1. Introduction:
1.

What is OCR?
OCR is the acronym for Optical Character Recognition. This technology allows a machine to automatically recognize characters through an optical mechanism.

2.

History of OCR?

in 1914, Emanuel Goldberg developed a machine that read characters and converted them into standard telegraph code& Edmund Fournier developed the Optophone.

2. Background:
Previous work:
1. Gurumukhi Script OCR.

2. Multi-Feature Extraction For Printed Thai Character:

3. Features:
A. 32 Slope Feature :
1. extract the letters in the smallest possible area & resizing. 2. Get centroid of each segmented letter. 3. Record all rows and columns for any data pixel. 4. Calculate slope for each data pixel by : (y of point-y of centroid)/ (x of point-x of centroid) 5. Divide the segmented letter into 32 slopes lines. 6. Get the average of all rows for all data pixels that lies on each slope line and similarly for columns. 7. Calculate euclidean distance between each row in the resulted matrix and the centroid .

Features cont.:
B. 16 sectors feature :
1. extract the letters in the smallest possible area. 2. resizing. 3. Get centroid of each segmented letter. 4. Divide the segmented letter into 16 equal sectors. 5. Obtain the center of gravity of each sector. 6. Calculate euclidean distance between the center of gravity of each sector and the centroid of the segmented letter, the result is: D=[d1 d2 d3 .. dn] Where: di: is the Euclidean between the center of gravity of each sector and the centroid of the segmented letter. And n=16.

Features cont.:
C. 144 Combined Feature:
To understand it first, we explain these 2 features: Sixteen block feature. 8 sectors feature. So THE 144 Combined Feature is combined feature between 2 features are: 1- Modified eight Sectors Feature: The resulted vector will be of size: 8X16=128 element . And: F= [E (1, 1) E (2, 1) E (8, 16)] Where: E (i, j)= The center of gravity of the sector number i in the block number j. 2- Sixteen Block Feature.

Features cont.:
We combined the two previous features as follows:
E= [E (1, 1) E (2, 1) . E (8, 16) alpha*[D (1) D (2) D (16)] Where: E (I, j) = the center of gravity of the sector number i in the block number j

How to choose the best value of alpha?


By testing all test data in different s by classifier NN , we got this graph between and Accuracy.

Features cont.:
From Graph we chose =.2 as it has the best accuracy:

4. Classifiers
A. Nearest Neighbour Classifier (NN):
The class that contains the training sample whose feature vector is the nearest (least Equaledian distance) to the test sample feature vector is the expected class.

B. Support Vector Machine Classifier (SVM):


We work for maximum gab between the two classes.

Algorithm of SVM:
Is shown in the next figure:

Classifiers cont.:

Classifiers cont.:
- The result will be 325 structures = 25+24+23++3+2+1 - We will classify each of them with the feature vector of test sample, and then the result will be a vector of 325 elements (each represent a class). - We will take the most repeatable class as the expected class.

C. Artificial Neural Networks Classifier:


A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships, Neural networks resemble the human brain in the following two ways: they acquire knowledge through learning, and the knowledge is stored within inter-neuron connection strengths known as synaptic weights.

Classifiers cont.:
D. Combining all three classifiers:
we expect certain class that is found more than one time for each test sample and if all three classes in this row are different, we expect the class that lies on the column that represent the classifier whose accuracy for this feature is the best.

5. Experimental Work:
Checking Our Features:
We tried them on handwritten English numbers (0 1 2 3 4 5 6 7 8 9).

Feature 32 slope NN 16 sectors NN

Accuracy 91.4% 82.38 %

144 combined NN

94.82 %

Experimental Work cont.:


B. Checking for the best resizing:
We worked on training data of size of 592 sample distributed on 26 classes and test data of size of 531 sample distributed on the same 26 classes. size 32 slope 32 slope 16sectors 16 sectors NN 96*24 96*32 We took size of 104*48. 96*36 96*40 96*48 104*48 112*48 120*48 128*48
84.5574% 86.8173% 84.7458% 89.2655%

SVM
90.3955% 89.8305% 87.7589% 88.7006%

NN
90.2072% 90.7721% 89.6422% 89.8305%

SVM
89.4539% 90.9605% 90.3955% 90.2072%

88.1356%
88.1356% 88.5122% 87.5706% 84.3691%

89.2655%
88.7006% 87.5706% 88.1356% 87.9473%

90.7721%
90.5839% 90.2072% 90.3955% 90.2072%

90.3955%
91.3371% 90.9605% 91.9021% 90.5838%

Experimental Work cont.:


C. Our Final Data:
we increased data to be: Training data of size of 1207 sample distributed on 26 classes and test data of size of 1190 sample distributed on the same 26 classes.

D. Our first experiment to obtain high accuracy:


Feature/cla ssifier
32 slope 16 sectors 144 combined

NN

SVM

Neural Network
72.3529% 82.521% 91.2605%

Combined classifier
85.3782% 88.4874% 92.6891%

82.0168% 87.4790% 93.4454%

84.1176% 85.4622% 86.8908%

Experimental Work cont.:


E. Checking the best number of slope lines for the Slope Feature:
dime Accuracy of nsion NN % 8 86.4709 Accuracy of SVM % 85.8824 accuracy of Neural Network % 72.9412 Accuracy of combined classifier% 86.7227

16
24 32 40 48 56 64 72

84.2857
82.437 82.0168 78.4874 75.5462 78.5714 76.9748 75.1261

83.3613
82.2689 84.1176 80.0840 78.9076 79.1597 81.9328 79.0756

67.1429
69.9160 72.3529 69.7479 67.2269 67.8151 71.4286 60.0840

85.1261
83.8655 85.3782 80.0840 78.7395 78.4874 80.1681 78.9076

80

74.8739

77.0588

62.5210

77.5630

Experimental Work cont.:


So the best dimension for slope feature is 8 as it has the best accuracy=86.7227% delivered by Combined Classifier. F. Reduction of the 144 feature Using PCA: The 144 Combined Feature takes very size of memory so we must reduce it and in the same time accuracy should not be affected a lot. Steps: 1. Compute the mean 2. Compute the covariance 3. Compute the eigenvalues and eigenvectors of the matrix 4. Order the eigenvalues by magnitude 5. For many datasets, most of the eigenvalues are negligible and can be discarded.

Experimental Work cont.:

Accuracy of NN

Accuracy of SVM 84.2017

Accuracy of Neural network 87.7311

89.6639

Accuracy of Combined classifier 89.7479

Experimental Work cont.:


G. Second Experiment using K-Means:
We wanted to reduce the number of training samples of each class to make size of used memory is smaller.

K-Means Algorithm: 1. Initialize centroids of K-clusters randomly. 2. Assign each sample to the nearest centroid. 3. Calculate centroids (means) of K-clusters. 4. If centroids are unchanged, done. Otherwise, go to step 2.

Experimental Work cont.:


Results at k=20:
Feature Accuracy of NN % Accuracy of SVM % Accuracy of NEURAL NETWORK % Accuracy of combined classifier %

8 slope 16 sectors 144 combined 32(reduction from 144)

85.1261 86.4706 93.2773 89.2437

84.7899 85.8824 86.1345 82.0168

48.9916 75.2941 90.0840 85.6305

84.8739 87.0588 93.5294 89.3277

Experimental Work cont.:


H. First trial for increasing the accuracy:
Using the 144 and the 32(reduction from it)features. We will use the concept of KNN that means if k=5 for example If the class of test sample is one of the classes of any one of the 5 nearest training samples to the test sample then, we will classify the test sample depending on these 5 classes. We will take k =2 and k = 5.
K 144 feature SVM 144 feature minimum distance 91.7647% 86.2185% 32 reduced of 144SVM 32 reduced from 144 minimum distance 88.8235% 84.2017%

But accuracy didnt increase

2 5

92.0168% 91.0084%

89.4118% 87.7311%

Experimental Work cont.:


I. Second trial for increasing the accuracy:
results when replacing test samples that make error:
Feature Classifier/Accuracy % Classifier/Accuracy %

144 combined
144 combined with KMeans(k=20) 32 reduced from 144 32 reduced from 144 with K-Means(k=20)

NN/95.0378
NN/95.0378 NN/92.6829 NN/92.3465

SVM/90.0757
SVM /88.9823 SVM /87.9731 SVM /85.7864

Conclusion
The first Confusion Matrix is of the best accuracy before replacing error test data with accuracy of 93.5294 %. The second Confusion Matrix is of the best accuracy after replacing error test data with accuracy of 95.0378 % and fortunately it happened with K-Means with K=20 so with smaller memory size.

You might also like