You are on page 1of 17

Artificial Intelligence

By: Kelvin Phan

ABSTRACT The aim of this assignment is to compare and contrast a symbolic AI approach with a non-symbolic AI approach to the same data

Table of Contents
I. II. 1. 2. 3. 4. III. 1. 2. IV. V. 2. 3. 4. 5. INTRODUCTION: ............................................................................................................................... 3 NEURAL NETWORKS: .................................................................................................................... 3 DESIGNING DATASET: ....................................................................................................................................3 SETUP NEURAL NETWORKS (ANNS):.........................................................................................................4 TESTING THE NEURAL NETWORK: ..............................................................................................................5 ANALYSIS:.........................................................................................................................................................6 SYMBOLIC MACHINE LEARNING: ............................................................................................. 7 ID3 ALGORITHM: ............................................................................................................................................7 PRISM ALGORITHM: ......................................................................................................................................8 CONCLUSION: .................................................................................................................................. 9 APPENDIX: ...................................................................................................................................... 10 DATASET: ...................................................................................................................................................... 10 ARTIFICIAL NEURAL NETWORK: ............................................................................................................... 12 ID3 ALGORITHM: ......................................................................................................................................... 14 PRISM ALGORITHM: ................................................................................................................................... 16

I.

Introduction:

The aim of this assignment is to compare and contrast a symbolic AI approach with a non-symbolic AI approach to the same data. A Vietnamese Alphabet dataset, which has ten samples, will be trained and tested in different types of mathematical model such as: neural networks (ANNs), symbolic learning algorithms ID3 or PRISM. In this assignment, WEKA tool will be used to as a machine learning software to train and test the data. The results will compare and the recommendation will be attach at the end of the report.

II.

Neural Networks:

1. Designing Dataset:
Ten capital latters was chose as visual patterns to present the dataset and they are: A, , , B, C, D, , E, , G. Vietnamese alphabet is the combination of English alphabet (a, b, c atc) and some special symbols such as: ^, , , -. Some of the words vowels , , , differ from their original a, e and o when adding those , , so they can suitable with old styled Vietnamese (Sino-Vietnamese) pronunciation. The reason that they were chose cause they are complicated visual patterns compare to English alphabet or Latin number, so it will be more challenge to test the accuracy of mathematical models for comparing later. In machine learning, the best way to present the patterns is using pixel matrix with 1 and 0, so all the visual patterns will be present in a matrix with seven columns and twelve rows (7x12) as you can see in the figure 1 below. The position with x is occupied by a part of the letter and will be identify as 1 in the data, so the blank position will be identify as 0.

Figure 1

2. Setup Neural Networks (ANNs):


An artificial neural network as known as neural network is a mathematical model, which is inspired by biological neural network. The network has three parts: Input, hidden nodes and Output. There are 84 input nodes indicating to the pixel matrix, which are fully connected with the 10 outputs. Then, they will be trained in different numbers of hidden layer to see which neural network give the better training results. In WEKA, neural network is known as multiplyerPerceptron function. Before the data is trained, the neural network architecture will be setup as (figure 2): GUI and auto-build will set to be true, this will allow the pausing and altering of the neural network during training and the hidden layers are connected in the network. Debug will be set to false, so the system will not add anything more such as: hidden layer etc that can effect the output results. The number of hidden layer will be change each time the data was trained; it will be 10, 20 and 30. The learning rate will set up to 0.3; it is the amount of weight is updated. Momentum that applied to the weights during updating will be 0.2. Nominal To Binary Filter, normalize Attributes, normalize Attributes will be setup to false. Training time will be 5000, that is the number of epochs to train though.

Figure 2 After the data are trained with different number of hidden layers, they all show very good results. Cause all three neural networks gave the same result, so the best the way to check which one have the better learning is using the Root mean squared error for competition. The neural network has the smaller value of Root mean squared error is the better one. As the results are showed in the table below, the neural network with 30 hidden layers has the smallest number. So it will be choose to be the neural network for testing the data. Hidden layers Root mean squared error 10 0.0165 20 0.012 30 0.0106

3. Testing the Neural Network:


The neural network with 30 hidden layers is chose for testing that it can handle noisy input vectors or not. Firstly, testing datasets will be created to test the ideal Vietnamese alphabet dataset. The first testing dataset contains the same 10 patterns as the ideal dataset, except that one 1 bit somewhere in each input pattern is randomly changed to 0. The s econd testing dataset is the same with the first testing dataset but with two 1 bits in each pattern are randomly changed to 0. And the third testing dataset with changing three 1 bits to 0 in each pattern.

Figure 3 After training the ideal dataset with 30 hidden layers neural network, the trained neural network will be tested with the three testing datasets, one by one. Finally, the results will be captured for further analysis. The results will be include: root mean square error, the calculated accuracy, sensitivity and specificity from the confusion matrix (figure 4) results after every time the testing was run. Test 1 0.0129 100% 100% 100% 10 out of 10 Test 2 0.0154 100% 100% 100% 10 out of 10 Test 3 0.0760 100% 100% 100% 10 out of 10

Mean square error Accuracy (%) Sensitivity (%) Specificity (%) Correctly patterns

Figure 4 In the confusion matrix photo above, the Vietnamese alphabets are not identified as: a=A, b=fC. Because the WEKA does not know how to spell the Vietnamese alphabet, so it spells the words by using English alphabet with some symbols.

4. Analysis:
As the table above shows, the mean square error is going up when the learning result is tested with the higher noise dataset. It increases from 0.0129 for test 1 to 0.0154 for test 2 and then 0.0760 for test 3. The mean square error shows how is different of the testing dataset compare to the ideal dataset. Even the mean square error is getting bigger, the accuracy, sensitivity and specificity are still perfect with 100%. The accuracy tells how good the algorithm learns, the sensitivity is the calculated value for positive results of the algorithm. On the other hand, specificity is the calculated values for negative results of the algorithm. As the result, the ANN algorithm recognizes well all the patterns even that it was tested with different noise datasets. Neural network algorithm choose randomly weight (input hidden layer weight) for each attribute (position) and the weight will be updated in each iteration, the network will then learn every time the iterating is repeated until it get the minimum error value which is the different between the ideal output and the testing output. Then those weights will be used to calculate the weights for identify the patterns and it calls hidden output layer weights. Weights represent the memory of the neural network. In the noise datasets, only some of the positions are changed from 1 to 0, so the error values will not change too much, so the algorithm still can recognize the patterns well.

III.

Symbolic machine learning:

1. ID3 Algorithm:
ID3 is Iterative Dichotomiser 3 and it is known as C4.5 algorithm as well. It is use to generate a decision tree from a dataset. Firstly, the dataset will be trained for learning by ID3 algorithm model and it will give the results, which include the decision tree. The decision tree shows the shortest tree to get the patterns such as: when the position 5_2, 1_3, 1_4 = 0 and position 6_3 = 1, the pattern will be recognized as A (figure 5). Then the result will be tested with three different testing dataset to see how well is the algorithm learning.

Figure 5 Test 1 0.2 80% 80% 75% 8 out of 10 0 out of 10 Test 2 0.25 70% 70% 60% 7 out of 10 0 out of 10 Test 3 0.28 60% 60% 50% 6 out of 10 0 out of 10

Mean square error Accuracy (%) Sensitivity (%) Specificity (%) Correctly patterns Unclassified patterns

The values of the table above were calculated and got from the testing results from the learning model. As the table shows, the mean square error is getting bigger when the training result is tested with noisier dataset; it goes down from 0.2 for test 1 to 0.28.

Furthermore, the values of accuracy, sensitivity and specificity also decrease with the higher noise dataset, accuracy and sensitivity drops down from 80% to 70% and then 60%, specificity goes down from 75% to 60% and then 50%. As the result, the algorithm only recognize eight works out of ten with the test 1 dataset, seven out of ten for test 2 dataset and six out of ten for test 3 dataset. Luckily, even the algorithm gave some incorrectly patterns responding, but it still can give the prediction, no patterns cannot classify. ID3 algorithm trains the ideal dataset and perform a shortest decision tree for the dataset, causes the shorter decision tree is the better decision tree. Base on that decision, the ID3 algorithm can predict the patterns such as: when the position 5_2, 1_3, 1_4 equal to 0 and position 6_3 equals to 1, the algorithm will recognize as pattern A. When the testing dataset are created, some of the positions are randomly changed, which could be the positions that the decision tree uses to recognize the patterns. For example, on test 1 dataset, the position 1_4 was may changed from 1 to 0 on pattern A, so the algorithm will predict as A.

2. PRISM Algorithm:
Prism algorithm is a model that produces generalized rules such as: ..if..thenfor the dataset to get the results. Similarly to ID3 algorithm, the dataset will be trained with Prism algorithm model to perform a set of rules for the dataset, for example: if position 5_3 = 0 and position 2_3 = 0, the pattern will be recognized as A (figure 6). Then the result will be tested with three different testing dataset to see how well is the algorithm learning.

Figure 6 Test 1 0.0 100% 100% 100% Test 2 0.15 89% 89% 83% Test 3 0.26 67% 67% 50%

Mean square error Accuracy (%) Sensitivity (%) Specificity (%)

9 out of 10 8 out of 10 6 out of 10 Unclassified patterns 1 out of 10 1 out of 10 1 out of 10 The values of the table above were calculated and got from the testing results from the learning model. As the table shows, the mean square error is increasing when the training result is tested with noisier dataset; it goes down from 0.0 for test 1 to 0.15 for test 2 and then 0.26 for test 3. Furthermore, the accuracy of testing, the sensitivity of positive result and specificity of negative result also decrease with the higher noise dataset, accuracy and sensitivity drops down from 100% to 89% and then 67%, specificity goes down from 100% to 83% and then 50%. As the result, the algorithm only recognize nine works out of ten with the test 1 dataset, eight out of ten for test 2 dataset and six out of ten for test 3 dataset. However, one pattern was un classified, that is the reason why it got 100% accuracy for test 1 but it is only nine patterns were predict. PRISM algorithm trains the ideal dataset and then perform a set of rules for the dataset to predict the words on base those rules such as: if position 5_3 = 0 and position 2_3 = 0, the pattern will be recognized as A. However, when the noise datasets was created with randomly changed, some of the positions that are used in the rules set. For example: In the set of rule said that: if position 1_4 = 1 and position5_2 = 0 then it will be , if position 1_4 = 1 and position5_2 = 1 then it will be , in the test 2, the position 5_2 of the pattern is randomly change from 1 to 0, so when the result is tested with the test 2 dataset, it will predict the as base the rules. It is one of the patterns, which is unclassified and it is pattern B. The possible reason is missing value in the tested file. The rule to identify B is if position 8_4 = 1 and position 5_6 = 0 then it will be B. In the tested files, it maybe the position 8_4 was change to 0, and the algorithm could not tell what B is it because pattern B in the tested files now has not positions that are qualify with the positions in the rules set for identify the words. It is the reason why it could predict the pattern B. Correctly patterns

IV.

Conclusion:

Letters recognition is a simple example of AIs ability to learn and recognize patterns of objects. The datasets are performed in 2-dimentional matrixes so that wide range of subjects would be applied to. Other examples could be pictures, handwriting, etc. in this report, all three algorithms: Artificial neural network, IC3 and PRISM effectively learnt and tested with different testing datasets. All three algorithms have there all strong points, however, base on this case, the artificial neural network will be recommended for this kind of dataset. ANN gave the perfect results even after it was tested with higher test sets. On the other hand, ID3 and PRISM is getting harder and harder to recognize the words in the noise datasets and PRISM cannot even classify some of the patterns. If I have more time to do with this report, I want to try to change some positions from 0 to 1 to test the algorithms or reject some positions which are part of symbols to see how far that the algorithms can handle the noise symbols.

V.

Appendix:
1. Pattern designing:

2. Dataset:
% 1. Title: Vietnamese Alphabet % % 2. Sources: % (a) Creator: Kelvin Phan % (b) Date: 10/05/2013 @RELATION Vietnamese-Alphabet @ATTRIBUTE Position1_1 {0,1} @ATTRIBUTE Position1_2 {0,1} @ATTRIBUTE Position1_3 {0,1} @ATTRIBUTE Position1_4 {0,1} @ATTRIBUTE Position1_5 {0,1} @ATTRIBUTE Position1_6 {0,1} @ATTRIBUTE Position1_7 {0,1} 10

@ATTRIBUTE Position2_1 {0,1} @ATTRIBUTE Position2_2 {0,1} @ATTRIBUTE Position2_3 {0,1} @ATTRIBUTE Position2_4 {0,1} @ATTRIBUTE Position2_5 {0,1} @ATTRIBUTE Position2_6 {0,1} @ATTRIBUTE Position2_7 {0,1} @ATTRIBUTE Position3_1 {0,1} @ATTRIBUTE CLASS {A,,,B,C,D,,E,,G} %========================================================= @DATA % Letter A 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0, 1,0,0,1,0,0,0,1,0,0,1,1,1,1,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,A % Letter 0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0, 1,0,0,1,0,0,0,1,0,0,1,1,1,1,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0, % Letter 0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0, 1,0,0,1,0,0,0,1,0,0,1,1,1,1,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0, % Letter B 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0, 1,0,0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,1,1,1,0,0,B % Letter C 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0, 0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,1,0,0,C % Letter D 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0, 1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,1,1,1,0,0,D % Letter 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0, 1,0,1,1,1,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,1,1,1,0,0, % Letter E 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0, 0,0,0,1,1,1,1,1,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,1,1,0,E % Letter 0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0, 0,0,0,1,1,1,1,1,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,1,1,0, % Letter G 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0, 0,0,0,1,0,0,0,0,0,0,1,0,0,1,1,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,0,1,1,1,1,0,G

11

3. Artificial Neural Network:


Test1 result:

Test2 result:

12

Test3 result:

13

4. ID3 Algorithm:
Test1 result:

Test2 result:

14

Test3 result:

15

5. PRISM Algorithm:
Test1 result:

Test2 result:

16

Test3 result:

17

You might also like