You are on page 1of 5

International Journal of Computer Trends and Technology- volume4Issue3- 2013

Classification Rule Extraction using Artificial Neural Network


Pawan Suresh Upadhye, Parag Suresh Upadhye
Dept. name of Information Technology, Pimpri Chinchwad College of Engineering, Nigdi, Pune University Pune, India.

Dept. name of Information Technology, Pimpri Chinchwad College of Engineering, Nigdi, Pune University Pune, India.

Abstract In this paper, we present a new approach for extracting classification rules from feed forward neural networks (NNs) that have been trained on data sets having both discrete and continuous attributes. The proposed starts by first generating rules using single pass Apriori algorithm and then applying Bayesian algorithm for filtering the generated rules. Index Terms Data mining, Artificial Neural Networks, Classification, Rule Extraction, Normalization.

class .This data needs to be preprocessed before giving it as input to the neural network.

I INTRODUCTION One important task or approach in classification of data into predefined groups or classes .There are number of techniques for classification some are knowledge based, some are rule based while some are probabilistic. Neural Network is one of the most popular approaches used for classification as it provides good accuracy .But the results of ANN are incomprehensible and output is complex to understand .Decision tree is one of popular approach for classification but its complexity increases as the size of dataset Increases .Although there are various algorithms evolved like ID3,C4.5,C5.0 etc for decision tree but in order to reduce the size of tree and extract optimal rules it becomes quite complex II PROPOSED SYSTEM This paper focuses on the proposed system which determines on a new approach towards the rule extraction from ANN result and provides simple way with satisfactory accuracy. 2.1 Dataset: The dataset is collection of historical data .Dataset consists of number of attributes which determines the membership in

2.2 Normalization Normalization is one of the preprocessing technique in which the real-time data is normalized or converted into a a specific range. Min-Max is one of efficient preprocessing technique used for normalization. In this technique every instance is normalized based on following formulas B = (A-min (A)) / (max (A)-min (A))*(D-C) +C Where B=Normalized value D and C determines the range in which we want our values A is the attribute in dataset Then this normalized data is given as i/p to ANN

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 441

International Journal of Computer Trends and Technology- volume4Issue3- 2013


Now the D is converted to discrete values using fuzzification. The. The fuzzy membership function are used for conversion as it gives more efficient results as membership of value is considered into discrete range only if its membership is maximum. Appropriate fuzzifications [4] the mf (membership function) is selected and levels for discritisation are decided depending upon o/p at each level. Example: temperature can be discritized as high, low, medium or Very high, high, low, very low, medium etc.

2.3 Neural Network: ANN is mathematical model or computational model consists of an interconnected group of artificial neurons and process information using a connectionist approach to computation .In most cases an ANN is adaptive system that changes its structures based on external or internal information that flows through the network during the learning Phase. Training ANN: In proposed system we use supervised training method .supervised learning is approach in which every input pattern is associated with an output pattern, which is target or desired class. The normalized dataset is given as i/p to NN.The major task f is to decide the architecture of NN .The following algorithm can be used for deciding architecture

Algorithm for training: Step1: Initially start or set the i/p neurons in input layer equal to no. of attribute, one hidden layer with one neuron, and output layer with number of neurons as one or no of target class Step 2: Train the Neural Network with the predefined architecture and dataset. Step 3: Check mean square Error rate value after training with the expected one. Step 4: If it is less than the expected, add another neuron in hidden layer and again follow step 2. Step 5: Follow the steps 2, 3, 4 till the no of neurons in hidden layer is less than or equal to 2i (i=no of i/p nodes). Step 6: If the desired accuracy is not achieved and the neurons in hidden layer become >=2i then add another hidden layer with one neuron and follow above steps till 2i condition Analyze the previous results and select the one with least MSE and number of neurons for layer before adding new layer. Step 7: follow above steps till desired result is achieved. Now let D be the set of correctly classified samples by the neural network. 2.4 Continuous to discrete conversion: 2.5 Inference engine:

The discrete data acts as input for the inference engine that determines the classification rule
The triangular curve is a function of a vector [4], x, and depends on three scalar parameters a, b, and c, as given by

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 442

International Journal of Computer Trends and Technology- volume4Issue3- 2013


Inference engine takes discritized data as input and determine the classification rule as output. It is build using two algorithms apriori single pass and Navie-Baysian for extracting rules. Single pass apriori [2] [5] scans the dataset and checks Association between the input and the output class determined and generates rules .The rule generated by the apriori contain some overlapping classes and some conditions are not determined in this rule set. Bayesian is used overcome this overlapping rules and generate rules for the unhandled condition. Algorithm: 1. D : Set of tuples Each Tuple is an n dimensional attribute vector X : (x1,x2,x3,. xn) 2. Let there be m Classes : C1,C2,C3Cm 3. Nave Bayes classifier predicts X belongs to Class Ci if P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i 4. Maximum Posteriori Hypothesis P(Ci/X) = P(X/Ci) P(Ci) / P(X) Maximize P(X/Ci) P(Ci) as P(X) is constant 5. With many attributes, it is computationally expensive to evaluate P(X/Ci). 6. Nave Assumption of class conditional independence

Single pass Apriori algorithm: Classification algorithms generally generate decision trees or classifiers according to pre-determined target; therefore, they need to be tuned to produce human readable rules that can be used in decision support. In this study, an integrated approach was proposed to produce association rules that can be used as classifiers. Apriori [2] algorithm was used as a base model and modified the algorithm to be able to generate human readable classification association rules. Input: Dataset with discrete values. Output: Rule set. Algo_SinglepassApriori () { Step 1: For every record in data set if: The key is present in the hash table then increment the key value(association count) Else Insert the record in the hash table. Initialize the association count to one. Step 2: Perform step 1 till all records in the input dataset are scanned. Step 3: Check the conflicting (overlapping) rules and eliminate those cases. Step 4: The contents of hash table are the rules. } Nave Bayesian Classification It is based on the Bayesian theorem. It is particularly suited when the dimensionality of the inputs are high. Parameter estimation for naive Bayes models uses the method of maximum likelihood. In spite over-simplified assumptions, it often performs better in many complex real world situations The Bayesian algorithm is used to validate the rules extracted using Apriori algorithm. Advantage: Requires a small amount of training data to estimate the parameters

III EXPERIMENTAL RESULTS: In this section the results of the proposed system on Bupa dataset are described. Bupa has 6 attribute from liver disorders that might arise. It has 345 instances. Attribute information: 1. Mcv: mean corpuscular volume 2. Alkphos: alkaline phosphates 3. Sgpt: alamine aminotransferase 4. Sgot: aspartate aminotransferase 5. Gammagt: gamma-glutamyl transpeptidase 6. Drinks: number of half-pint equivalents of alcoholic beverages drunk per day 7. Selector field used to split data into two sets

Training results: Srno 1 2 3 4 5 6 7 8 9 10 11 layers 2 2 2 2 3 2 3 3 3 3 3 Neurons 7,3 7,4 7,1 8,6 7,3,3 9,9 9,10 9,11 9,12 9,10,4 9,10,5 Regression 0.87 0.82 0.7 0.805 0.6 0.933 0.939 0.927 0.922 0.94 0.94 MSE 0.230 0.306 0.444 3.45 3.85 0.128 0.116 0.140 0.157 0.099 0.10

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 443

International Journal of Computer Trends and Technology- volume4Issue3- 2013


%Training accuracy = NCC/TR Where, NCC= Number of correctly classified records TR=Total records (instances) %Training accuracy: 92.68% Continuous to Discrete Conversion: D1 <68 <32 <16 <11 <29 <2 D2 68-70 33-51 17-41 12-24 30-77 2-5 D3 71-80 52-70 42-66 25-37 78126 6-8 D4 81-87 71-90 67-91 38-49 127175 9-11 D5 88-93 91109 92116 50-62 176226 12-15 D6 9499 110128 117141 6375 227275 1618 D7 >99 >128 >141 >75 >275 >18

A1 A2 Fig. Training Performance A3 A4 A5 A6

Inference engine
Inference engine consists of Single pass Apriori algorithm and Bayesian Classification. Both algorithm works in co-ordinate manner. Initially Apriori is applied to extract classification rule from the input data, then Bayesian algorithm is used to check, validate and correct the rules generated by Apriori algorithm[2] [5]. Fig. Regression plot after training The rules generated as a result of inference engine are in following form: Ai Dj, Ai+1 Dj, Ai+2 Dj , An Dj => Ci Where, Ai denotes the attribute number Dj denoted discritized level Ci denoted the class. Fig. Architecture of ANN I V CONCLUSION In this way this approach provides a new a new way towards rule extraction for classification. The rules generated by this approach provide satisfactory accuracy. But the no of rules generated by this approach are more. This can be overcome by using apriori in incremental manner i.e. by eliminating the least significant attributes (entropy) and thus reducing number of rules.

Neural Network Training Results: Total Instances Correctly Trained Incorrectly trained 345 337 08

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 444

International Journal of Computer Trends and Technology- volume4Issue3- 2013


REFERENCES
1. Recursive Neural Network Rule Extraction for Data With Mixed Attributes By Rudy Setiono, Senior Member, IEEE, Bart Baesens, and Christophe MuesJ. IEEE transactions on neural networks, vol. 19, no. 2, february 2008. 5. 2. One pass incremental association rule detection algorithm for network intrusion detection system by VIJAY KATKAR ,Depart of Computer Engineering, Bharti Vidyapith,Pune, Maharashtra, India and REJO MATHEW, Depart of Information Technology, NMIMS University, Mumbai, Maharashtra, India.International Journal of Engineering Science and Technology (IJEST). 6. 3. Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation By Bart Baesens,Rudy Setiono, Christophe Mues and Jan Vanthienen.Management Science 2003 INFORMS Vol. 49, No. 3, March2003 pp. 312329. 4. Fuzzy basis functions for triangle-shaped membership functions: Universal approximation - MISO case Fernando di Sciascio, Ricardo Carelli INSTITUTO DE AUTOMATICA, UNIV. NAC. DE SAN JUAN. Generating Classification Association Rules with Modified Apriori Algorithm, B. TUNC and H. DAG, Computational Science and Engineering Department, Istanbul Technical University (ITU),Maslak, 34469,TURKEY.Proceedings of the 5th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases, Madrid, Spain, February 15-17, 2006 (pp384-387). Naive Bayes classiers by Kevin P. Murphy.Last updated October 24, 2006. Link: http://www.cs.ubc.ca/~murphyk/Teaching/CS340Fall06/reading/NB.pdf

ISSN: 2231-2803 http://www.internationaljournalssrg.org

Page 445

You might also like