You are on page 1of 5

Classification of ECG Patterns Using Fuzzy Rules Derived from ID3-Induced Decision Trees

Amine M. Bemaid*+, Nadia Bouhouch', Rachida Bouhouch", Roukiya Fellat" and Rachida h i "
Abstract
In cardiology, determining whether an electrocardiogram (ECG) is normal or not is sometimes referred to as ECG classification. ECG is the most frequently-used means of cardiac diagnosis. It i s the cheapest and the most widely-available; it is also crucial for detecting rhythmic problems. In this paper we derive fizzy rules for ECG classiJicationfrom ID3induced decision trees. The system of j i u v rules i s designed based on 106 ECG's, and it is evaluated using a validation set of 48 ECG's care&& selected by cardiologists. Using the same IO6 ECG's for design and the same 48 ECG's for validation, an ID3-generated decision tree yields 73% correct class@cations, and a neural network trained with the feed-fonuard cascadecorrelation algorithm produces 85.4% correct c1assBcations. On the other hand, the derived jiuzy rules, combined with an optimized defirzrfication using the cascade correlation neuraE network, produces 100% correct classifications.

classification problems because of their ease of implementation and the comprehensibility of the decision trees. ID3-derived rules work well when the input features have symbolic and discrete values. However, the ID3-based algorithms, which are symbolic in nature, are known to be worse in modeling domains in which there is a 1zge number of continuous-valued features [6,7]. For the classification of ECG's, incorporating fuzzy concepts into rule sets, could be a way to approach the system performance achieved by expert cardiologists. We use the algorithm proposed in [XI for rule fuzzification; it is based on the ID3 decision tree algorithm and on a neural network-based optimized defuzzification method. We produce fuzzy rules for ECG classification. in Section 2, we present some previous attempts to classify ECG's. Section 3 describes the different algorithms applied in Section 6 to ECG classification. The optimized defuzzification method is described in Section 4. Section 5 outlines the structure of ECG data. Section 6 reports the performance of ID3derived fuzzyrules through experimental results. Finally, Section 7 summarizes the conclusions.

Key words: ECG, ID3, fuzzy rules, defuzzification and neural networks.

1. Introduction
Different approaches fiom pattern recogi iiicn, machine learning and expert systems have b e n ilsed ID intelligent diagnostic systems. S c h i f f " et al. [11 compared several classification algorithms using a medical data set; Rprop - an optimized version of the backprop neural network algorithm - produced the best results, being surpassed only by the feed-forward cascade-correlation (FFCC) neural network (NN). Dietmar [2] applied f u z z y logic in some medical applications, such as ECG analysis, drugkinetics and anesthesia monitoring. The decision tree-based approach, as a machine learning technique, has been applied successfully to several practical problems [3]. Quintan's ID3 algorithm and its variations [4,5] have been applied to various

2.

Some Previous Attempts Classificatisn

at

ECG

Several techniques have been proposed for ECG classification. Grauel et al. [9] develop a fuzzy rulebased system for ECG interpretation using medical expert knowledge, but it remains useful for basic diagnosis only. Vullings et al. [lo] develop an algorithm to decide whether an ECG is normal or not. The algorithm first extracts features from the ECG using an unsupervised single-layer NN. Then it clusters the length of the separate features into several sets. Finally, a reference set of features, classified normal by an anesthesiologist, is compared with a newly obtained set of features, using fuzzy clusters obtained from the previous step. If the similarity exceeds a certain threshold, the signal is classified as normal. Preliminary

* CorrespondingAuthor, amine@AlAkhawayn.ma
+

'

'

GraduateComputer Science Division, A I Akhawayn University in Ifrane (AUI), School of Science and Engineering, Iffane, Morocco. CardiologieB, Hopital d'Enfants, Rabat, Morocco. CardiologieA, Avicennes, Rabat, Morocco.

0-78034453-7/98 $10.00 0 1998 IEEE


34

results based on four patients indicate that this method is promising for the problem at hand. In [ I l l N s outperform tvvo widely used rule-based interpretation programs for classifying ECGs whose acquisition suffers from misplacement of electrodes during the recording of sun ECG.

3. Algorithims Description
In this work, we seek a system (1) that learns to classify ECGs, (2) whose acquired knowledge can be explicated then verified by expert cardiologists and (3) with very good classification performance. The different algorithms used to build such a system are described next.

fuzzy rule-based systems have been applied successfully for both control and pattem recognition [12]-[16]. In some cases, fuzzy rules have been acquired from experts; in other cases, they have been produced by learning from examples [15,16]. However, when the number of training pattems and the number of features are both large, the induced rule set is large and the IF-parts of rules are long; consequently, classification becomes computationally expensive. Fuzzifying the simpler ID3induced crisp rules is an effective way to produce a compact set of decision rules.

3.1. ID3-Decision Trees and Rules


Quinlans ID3 [4]produces a classification tree by learning from training examples. Each training pattem consists of a set of m input features (F;, i = 1,2, ..., m) and an associated class. Figure 1 shows a decision tree produced by ID3 for a four-input problem. Each node represents a feature Fi while each branch, fj, represents the j* interval of values of feature F;, which serves as a test. For a feature with discrete values, fUstands for the j* value of feature Fi. A leaf node, Lp, contains a classification C(Lp). ID3 generates conjunctive rules. There are all together seven rules represented by the decision tree shown in Figure 1. One of these rules (the highlighted path) is: IF Fl(f13)AN:D F3(f31) AND F4(f41) THEN class is C(L,), where Fi(fj) is defined as

Figure 1. A decision tree produced by the ID3 approach. Maher and St Clair [5] have presented a so-called UR-ID3 algorithm to combine uncertain reasoning with the rule sets produced by ID3 to deal with the uncertain training and test data. In their approach, the value associated with each branch of the tree is considered to be approximate and so are feature values. A triangularlyshaped membership function is used for each of these values. The classification of a test pattem is done by considering the corresponding set of support intervals for each possible classification. Following Maher and St. Clairs idea to combine the fuzzy concept and ID3decision trees, Chi and Yan [SI developed an approach to fuzzify simple rules derived from an ID3 decision tree to obtain a compact set cf fuzzy rules.
I? m ID3 rule, anteczdent conditions are either TRUE or FALSE, and only one rule is chosen to perform classification. Following [SI, we convert these rules to fuzzy rules, such that a fuzzy rule may be (partially) applicable whenever a fact that partially hlfills its condition is true. One of two different approaches is taken to fuzzify the crisp rules, depending on whether the feature involved in the IF-part of the rule is discrete or
continuous:

qfj) =

1, if feature5 has valuefC 0 otherwise

After the tree is built (using, for example, a set of ECGs), a new pattern (e.g., a new ECG) can be classified by starting at the top of the tree, answering questions and following branches accordingly until we reach a leaf, where the classification is stored. An ID3 rule for ECG classification may look like: IF (the patient is resting AND the heart frequeuy is greater than 60 and less than 80 AND the value of feature S bebongs to [0.3,1.5] AND the Amplitude of T wave is less than -0.5) THEN the ECG is abnormal.

3.2. Rule Fiuzzification


ID3 rules work well when input data are accurate. However, their performance degrades when input data are uncertain, noisy, or fuzzy [8]. Fuzzifying ID3generated rules can be used to address the fuzziness in the data and improve system performance. Generally,

Discrete features: for a feature Fi with L discrete

35

values, an L*L fuzzy relational matrix is constructed to capture the similarity between the different values that feature Fi can take on. For the purpose of ECG classification, the similarity matrix is constructed by cardiologists. For example, for the feature PatientStatus that can take on one of five discrete values (At-Rest, Changing-Position, In-Hyperventillation, Eating, Sleeping), the fuzzy relational matrix obtained is:

concurrently by other hidden nodes which are updating their weights with conflicting objectives. FFCC addresses the moving target problem by introducing a dynamic cascade architecture and an associated learning algorithm [191. FFCC's architecture evolves dynamically. It starts with input units, output units and no hidden units. Then it automatically trains and adds new hidden units one by one, creating a multi-layer structure with one hidden unit per layer. Once a new hidden unit has been added to the network, its input-side weights are frozen. For each new hidden unit, the magnitude of the correlation is maximized between the new unit's output and the residual error.

4. Optimized Defuzzification
As a result, when for example, the patient is eating, a rule whose IF-part is : "IF the patient is resting" is partially applicable with degree 0.4.
A commonly-used defuzzification method for fuzzy logic control applications is the centroid defuzzification. However, when the membership function of the output is convex, the centroid method may not yield satisfactory results. We use the optimized defuzzification technique described in [SI. It consists of using a NN to learn the "optimal" crisp output of the fuzzy rule-based system.

Continuous-valued features: for a continuousvalued feature Fj, the shape of a piece-wise linear membership function is constructed by cardiologists. This function description is then converted into mathematical equations. For example, consider the feature Age that can take on "Newborn" as value. The membership function pFj(Newbom) for Newborn is described by:
(-0.1 * age) + 1, if age 5 1 day (-0.016 * age) + 0.916, if 1 day < age < 1 week (-0.043 * age) + 0.83, if 1 week < age < 1 month (-0.01 1 * age) + 1.044, if 1 month 5 age < 3 months 0, if age 2 3 months

Assume the system consists of K conjunctive rules. The N"s input consists of a K-dimensional vector where the ithcomponent corresponds to the degree D', to which the (input) ECG pattern (P' matches the antecedent <FI=p1,F2=p2,...Fj=pj,...FL=~L>) condition of the i~ rule. Let fuzzy rule i: IF (F, is Ail

AND F2 is

Ai2 - AND ...Fj is

AND...FL is AiL) THEN


I

Class is Ci; and let pij be rhe membership grade of pj in fizzy set Aij.; the value DIP of the i-th input to the NN is given by

As a result, when for example a patient is 2 months old, a rule whose IF-part is: "IF patient is a Newborn" is partially applicable with degree 0.373.

3.3. The Feed-Forward Cascade-Correlation

Neural Network Algorithm


The feed-forward cascade-correlation (FFCC) neural network algorithm was developed in an attempt to overcome certain problems and limitations of the popular feed-forward back-propagation (backprop) neural network algorithm [ 171. FFCC mitigates backprop's slowness by using the quickprop algorithm [18] to address the step-size problem and achieve a faster move to a minimum of the error function. Another problem associated with backprop is the moving-target problem: each hidden node tries to update its weights in order to minimize the error signal as it was backprouagzted CO this node; however, this error signal is being changed

The NN has one output node, which takes one of two target class values - 1 for "Normal?) ECG and 0 for "Abnormal."

5. The Nature of ECG Data


Electrocardiography enables recording electric processes of the heart function. An electrocardiogram (ECG) is recorded in lead groups simultaneously. The ECG leads are assumed to be the projection of the the equilateral so called electrical heart vector o ~ t o Einthoven triangle in the frontal plane only [20]. For analytical ECG diagnostics, at least twelve leads are

36

used, namely six extremity leads and six thoracic wall leads. These twelve leads are generally called standard leads. The ECG signals of these standard leads are passed on to a feature extraction sub-system, where characteristic features that are used by physicians for classification ;ire calculated. This paper is not concemed with extracting the features that describe an ECG; rather, it deals with determining the normality of an ECG given its feature description.
A normal ECG is composed of a P wave, a QRS complex, and a T wave. The P wave represents the spread of excitation in the atria. The QRS complex, which represents the spread of excitation in the ventricles, actually consists of three separate waves: the Q wave, the R wave and the S wave. The T wave indicates the repolarization in the ventricles.

different sets of experiments using Xd: one implementing the leave-one-out strategy [24] and the other one implementing the iterative strategy [4]. These two sets of experiments were used to identify the subset of Xd that produces the decision tree with the best performance. This procedure led to selecting a subset Xb c Xd with 53 patterns. ID3 trained with Xb produces a decision tree that classifies correctly all 106 patterns in Xd. We will refer to the set of 80 rules extracted from this decision tree as TID,. Applying these rules to patterns in the validation set X' results in about 27% misclassification rate. These relatively high numbers (both for the number of rules and for the misclassification rate) may lead one to believe that ID3 is not generalizing as much as it could from Xb. For comparison purposes, an FFCC neural net was trained using Xb. The trained network uses 46 hidden units and misclassifies 4 out of the 53 patterns in (Xd Xb);furthermore, it misses 14.6% of the pattems in X". The next experiment consists of fuzzifying the 80 rules in TID3using the procedure outlined in Section 3.2. We will refer to the resulting set of f u z z y rules as TID3-F. Next, each pattern pj E Xb 6=1,2,4 3 ) is submitted to each rule i (i=1,2,..:,80) in TID~-F, and the corresponding

We represent an ECG pattern with a set of 22 features that were selected by three cardiologists as being most relevant for ECG classification. 18 of these features are extracted from the ECG signal; the other four features are: patient-status, sex, age and position of the electrode. 8 of these features are symbolic (e.g. patient-status) and 14 are numerilc (e.g. dp: duration of the wave P). Cardiologists indicate that the interpretation of ECG's is based on intervals of values for each feature rather than on the values themselves. Consequently, the range of numeric values of a feature is partitioned into intervals based on normal values of that feature, as indicated by cardiologists. For example, normal values for the feature dp are: dp 20.12 for adults and 0.041dp50.08 for newborns. As a result, the numeric range for dp is reduced to four discrete values: dpl: [0, 0.04[, dp2: [0.04,0.08], 473: ]0.08,0.12], dp4: 10.12, +CO[.

0 ; is computed, as in Section 4.

,..., > eighty-dimensional vectors cj=1,2, ...,53) and the corresponding training class labels are used to train an FFCC neural net to perform an optimized defuzzification and produce the classification
The of a new ECG pattern pn given its computed trained FFCC correctly classifies all 106 patterns in Xd (this includes the 53 patterns from Xb). Moreover, it produces a 0% error rate on the 48 pattems that make up the Validation set X'. This is an indication that the set of 80 induced fuzzy rules has good generalization capabilities, even though only 53 patterns (elements of Xb) are used for training in 22-dimensional space.

<of', 0 ;

The resulting 53

D:

0 : .

6. Experiments and Discussion


The first step in our experiments consists of generating a decision tree using ID3. The version of ID3 we employ uses an information entropy measure to select the feature to be used at each node and construct the tree accordingly. We do not employ windowing [21], CHIsquare forward pruning [22], or any kind of reverse pruning [23].. This accounts for the relatively large number of rules we obtain. Our experiments are based on 106 patterns and the corresponding class labels provided by cardiologists to form a training set Xd, along with 48 additional patterns that make-up a validation set X'. Patterns in X' were carefully chosen by cardiologists so that if these patterns are correctly classified by our system, cardiologists will be able to assert with confidence that the system is satisfactory $or ECG classification. We carried out two

7. Conclusions
In this paper, we derived fuzzy rules for ECG r o m KD3-induced decision trees, and we classification f used a neural net-based optimized defuuification scheme to produce crisp classifications. The rules were generated based on a design set of 106 ECG patterns (divided, after extensive experimentation, into 53 training and 53 test pattems) and tested using a validation set of 48 ECG's carefully selected by cardiologists. These fuzzy rules achieved perfect performance - 0 misclassifications on both the test set and the validation set. In contrast, crisp rules induced by ID3 using the same training set had a rate of only 73% correct classification on the validation set. Furthermore,

37

an FFCC neural network using the same training set produced 92.5% correct classifications on the test set and 85.4% correct classifications o n the validation set. This empirical evidence indicates that ID3-derived fuzzy rules are better than the straightforward ID3 for ECG classification. Furthermore, the fuzzy rule-based system that has been developed can reliably be used for screening abnormal ECGs. Still, a smaller number of rules is likely to be obtained if (prior to rule fuzzification) the ID3-induced decision tree is pruned.

References
[l] Schiff,
W., M. Joost, and R. Wemer, Comparison of Optimized backprop algorithms, in: M. Verleysen (ed.), European Symposium on Artificial Neural Networks. brussels, pp. 97-104, 1993. 121 Dietmar P.F. Moller. Fuzzy Logic in Medicine, EUFZT 96,September 2-5 1996. [3] Kononenko, I., Bratko, I., E. Experimentsin automatic learning of medical diagnostic rules, Technical report. Josef Stefan Institute, Ljubljana, Yugoslavia, 1984. [4] J. R. Quinlan. Induction of decision trees, Machine learning, vol. 1, pp. 81-106, 1986. [SI P. E. Maher and D. St. Clair. Uncertainreasoning in an ID3 machine learning framework, in Proc. Second IEEE Int. ConJ Fwiy Syst., San Francisco, CA, 1993, pp. 7-12. 161 Z. Chi and M. Jarbi. A comparison of MLP and ID3derived approaches for ECG classification, in Proc. Second Australian ConJ:Neural Networks, Sydney, Australia, 1991, pp. 263-266. [7] T. G. Dietterich, H. Hild, and G. Bakiri, A comparison of ID3 and backpropagation for English text-to-speech mapping, Tech. Rep., Dept. Comput. Sci., Oregon State Univ., Corvallis, OR. [SI Z. Chi, and H. Y a n .ID3-derivedfuzzy rules and optimized deffuzzificationfor handwritten numeral recognition, IEEE Trans. Fuzzy Syst., vol. 4, no. 1, pp. 24-3 1, 1996. [9] A. Grauel, G. Klene, L. A. Ludwig, Rule-Based Fuzzy Logic System for ECG Interpretation,EUFIT 97, September 8-1 1, 1997 [lo] E. Vullings, A.J. Krijgsman, and H.B. Verbruggen. Validating ECG Signals using an ANN and Fuzzy Clusters, EUFIT 96,September 2-5 1996. [111 Bo Heden, Mattias Ohlsson, Lars Edenbrandt, Ralf Rittner, Olle Pahlm, and Carsten Peterson. Artificial Neural Networks for Recognition of Electrocardiographic Lead Reversal, Neural Networks for Analysis of Electrocardiograms pp. 929-933, 1995. [12] C. C. Lee. Fuzzy Logic in control systems: Fuzzy logic controller, Part I, IEEE Trans. Syst., Man, Cybem., vol. 20, no. 2, pp. 404-418, 1990. [13] C. C. Lee. Fuzzy Logic in control systems: Fuzzy logic controller, Part 11, IEEE Trans. Syst., Man, Cybem., vol. 20, no. 2, pp. 419-435,1990. [141 J. C. Bezdek and S. K. Pal, Eds. Fuzzy Models for Pattem recognition.Piscataway, NJ: IEEE Press, 1992. [I51 Z. Chi and H. Yan. Map image segmentation based on thresholding and hzzy rules, Electron. Lett., vol. 29. no. 21, pp, 1841-1843, 1993. r161Z. Chi and H. Yan. Handwritten numeral recognition

using self-organizingmaps and huzzy rules, Pattern Recognition, vol. 28, no. 1, pp. 59-66, 1995. [17] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning Internal Representationsby Error Propagation in Rumelhart, D. E. and McCleltand, J. L., ParalleE Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, 1986. [ 181 Fahlman, S. E. Faster-Learning Variations on BackPropagation : An empirical Study in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988. [I91 Scott E. Fahlman and Christian Lebiere. The CascadeCorrelationLearning Architecture, In Advances in Neural Information Processing Sys., 2 (D.S. Touretzky, ed.), pp. 524-532. San Matio, Ca : Morgan Kaufmann, February 1990. [20] Pierre Godeau,Traite de Medecine, Tome 1,2ieme edition, Medecine Sciences Flammarion, Paris, 1987. [2 1] J. R. Quinlan. Learning efficient classification procedures and their application to chess endgames, in Michalski, R. S., Carbonell, J., and Mitchell, T. M., (eds.), Machine Learning: An artiJicial intelligence approach, Vol. I, Palo Alto: Tioga Press. 463-482, 1983. [22] J. R. Quinlan. The effect of noise on concept learning. In Michalski, R. S., Carbonell, J., and Mitchell, T. M., (eds.), Machine Learning, Vol. 11, Palo Alto: Tiago Press. 149-166, 1986. [23] J. R. Quinlan. simplifyingdecision trees. International Journal of Man-Machine Studies, 27,221-234, 1987. 1241 Devijver, P. and Kittler, J. Pattem Recognition: A StatisticalApproach, Prentice-hall, Englewood Cliffs, NJ, 1982.

38

You might also like