Professional Documents
Culture Documents
4, JULY 2010
Abstract—Diabetes mellitus is a chronic disease and a major tions, diabetes is found to be the fourth leading cause of global
public health challenge worldwide. According to the International death by disease.
Diabetes Federation, there are currently 246 million diabetic peo- The prevalence of type 2 diabetes is increasing at a fast pace
ple worldwide, and this number is expected to rise to 380 million
by 2025. Furthermore, 3.8 million deaths are attributable to di- due to obesity, in particular, central obesity, physical inactivity,
abetes complications each year. It has been shown that 80% of and unhealthy dietary habits [3]. Early detection of diabetes
type 2 diabetes complications can be prevented or delayed by early would be of great value given the fact that at least 50% and
identification of people at risk. In this context, several data min- 80% in some countries, of all people with diabetes are unaware
ing and machine learning methods have been used for the diag- of their condition and will remain unaware until complications
nosis, prognosis, and management of diabetes. In this paper, we
propose utilizing support vector machines (SVMs) for the diag- appear [2], [4].
nosis of diabetes. In particular, we use an additional explanation Recent studies have shown that 80% of type 2 diabetes com-
module, which turns the “black box” model of an SVM into an plications can be prevented or delayed by early identification and
intelligible representation of the SVM’s diagnostic (classification) intervention in people at risk [2], [4], for example, by chang-
decision. Results on a real-life diabetes dataset show that intelligi- ing their lifestyle [3] and/or by therapeutic methods. Intelligent
ble SVMs provide a promising tool for the prediction of diabetes,
where a comprehensible ruleset have been generated, with predic- data analysis, such as data mining and machine learning tech-
tion accuracy of 94%, sensitivity of 93%, and specificity of 94%. niques are, therefore, valuable for identifying those people. In
Furthermore, the extracted rules are medically sound and agree this context, several data mining and machine learning methods
with the outcome of relevant medical studies. have been proposed for the diagnosis and management of dia-
Index Terms—Data mining, diabetes, machine learning, medical betes [5]–[9]. It has also been shown that employing computer-
diagnosis. aided diagnostic systems (CAD) as a “second opinion” has lead
to improved diagnostic decisions [10], and support vector ma-
I. INTRODUCTION chines (SVMs) have shown remarkable success in this area [11].
In this study, we propose a novel hybrid model for medical di-
IABETES is a disease primarily associated with an in-
D crease in the level of blood glucose (hyperglycaemia) [1].
One reason for hyperglycaemia is insulin deficiency, where beta
agnosis, integrating three different data mining/machine learn-
ing techniques. In particular, an unsupervised and supervised
learning algorithm are used for sampling and model building,
cells in the pancreas fail to produce enough insulin. This is respectively, which are then followed by a rule-based explana-
known as type 1 diabetes. The other and most common type of tion component. Furthermore, we validate our model using a
diabetes is recognized as type 2, where the body cannot effec- real-life dataset for the prediction of type 2 diabetes. As we will
tively use the insulin produced [2]. demonstrate, our technique is able to predict diabetes with high
Chronic hyperglycaemia in people with diabetes increases accuracy, sensitivity, and specificity, which outperforms other
the risk of microvascular damage, which leads to retinopathy, techniques [8], [9] working on relevant problems. We also show
nephropathy, and neuropathy. Therefore, diabetes is the leading that the diagnostic criteria learned by our model are valid from
cause of blindness and visual impairment in adults in devel- a medical stand point and that our results are supported by other
oped countries [2] and is responsible for over one million lower medical studies [12].
limb amputations each year. Diabetic people are also exposed The paper is organized as follows. A brief background
to an elevated risk of macrovascular complications, where they for SVMs and rule extraction from SVMs is provided in
are two to four times more likely to get cardiovascular disease Section II. The experimental methodology is presented in
(CVD) than people without diabetes. Due to these complica- Section III, followed by results and discussion in Section IV.
Rule interpretation and validation is demonstrated in Section V
Manuscript received June 6, 2009; revised October 20, 2009; accepted and some conclusions are drawn in Section VI.
December 16, 2009. Date of publication January 12, 2010; date of current
version July 9, 2010.
N. H. Barakat is with the Department of Applied Information Technol-
ogy, German University of Technology in Oman, Muscat 130, Oman (e-mail: II. BACKGROUND
n.barakat@uq.edu.au).
A. P. Bradley is with the School of Information Technology and Electrical A. Support Vector Machines
Engineering, University of Queensland, Brisbane, QLD 4072, Australia (e-mail:
bradley@itee.uq.edu.au). SVMs operate by finding a linear hyperplane that separates
M. N. H. Barakat is with the Department of the Noncommunicable Diseases the positive and negative examples with a maximum interclass
Surveillance and Control, the Ministry of Health, Muscat 113, Oman (e-mail:
mnbarkat@yahoo.co.uk). distance or margin d. In the case of unequal misclassification
Digital Object Identifier 10.1109/TITB.2009.2039485 costs, a cost factor J (C+ /C− ) is introduced by which training
1089-7771/$26.00 © 2010 IEEE
BARAKAT et al.: INTELLIGIBLE SUPPORT VECTOR MACHINES FOR DIAGNOSIS OF DIABETES MELLITUS 1115
errors on positive examples outweigh errors on negative exam- lized to form an initial ruleset for the positive class, which is
ples [13]. Therefore, the optimization problem becomes then refined and pruned. A prepruning strategy is adopted to
1 prune rules with performance below a user-defined threshold.
minimize w2 + C+ ξi + C− ξj In the postpruning step, the algorithm utilizes the area under the
2 i:y =1i j :y =−1 j receiver operating characteristic (ROC) curve (AUC) to con-
Subject to yk (wxk + b) ≥ 1 − ξk , ξk ≥ 0. (1) trol the tradeoff between the classifier (ruleset) performance, in
terms of both the true positive (TP) and false positive (FP) rates
where yi is the class label, w is normal to the hyper-plane, and AUC, and comprehensibility as measured by the number of
|b|/w is the perpendicular distance from the hyper-plane to rules. Rules that do not result in a statistically significant (p <
the origin, w is the Euclidean norm of w, C is a regularization 0.05) increase in the ruleset AUC are pruned.
parameter, which defines the tradeoff between the training error 2) Eclectic Rule Extraction: In this approach [19], a labeled
and the margin d, and ξ i is a slack variable to allow errors in dataset is used to train an SVM to obtain a model (classifier)
classification [13]. with acceptable accuracy, precision, and recall. Next, a synthetic
To handle nonlinearly separable data kernel functions are dataset composed of the training examples that became SVs is
used, including kernel functions and a Lagrange multiplier αi , constructed with the target class for these examples replaced by
the dual optimization problem becomes the class predicted by the SVM. Rules representing the concepts
learned by the SVM are then extracted from the synthetic dataset
1
l l
maximize w(α) = αi − αi yi αj yj K (xi .xj ) using the C5 decision tree learner [20].
i=1
2 i=1,j =1
l
C ≥ αi ≥ 0 ∀i , αi yi = 0. (2) III. EXPERIMENTAL METHODOLOGY
i=1 A. Dataset
Solving for α, training examples with a nonzero α are called The dataset used in this paper has previously been used in
support vectors (SVs) and the hyper-plane is completely defined [21], which investigated the prevalence of diabetes in Oman.
by the SVs alone. Another study on this data [22] used data mining methods to
There is, however, a significant drawback to SVMs, in that learn rules for the diagnosis of diabetes. However, in this paper,
they have an inability to provide a comprehensible justification we will be employing SVMs for the first time for the diagnosis
for the classification decisions they make. That is, they are black of diabetes and justifying the SVM’s classification decisions by
box models. In medical diagnosis, it has been shown that the the extracted rules. A detailed description of the dataset can be
explanation of a classification decision is a crucial requirement found in [21]. However, for convenience, a brief description is
for the acceptance of black box models by end users [14]–[16]. provided next.
Therefore, techniques for rule extraction have been introduced Data from 4682 subjects of age 20 years and above was col-
[17] to enable SVMs to be more intelligible. lected using a questionnaire regarding demographic data, his-
tory, and anthropometric measures. Furthermore, blood pressure
B. Rule Extraction From SVMs and blood samples were analyzed to measure fasting venous and
Direct rule learners extract rules that describe a pattern or 2-h postglucose load [oral glucose tolerance test (OGTT)] for
relationships between input features and output class labels di- the participants of the survey in healthcare centers. Other at-
rectly from the data. In the case of black box models, these tributes were also collected, but omitted here as they are not
relationships are not comprehensible to end users. Therefore, relevant to this study.
the task of rule extraction is to devise rules from the model, The data collected about each subject include age in years
rather than directly from the data. In doing so, an explanation (min 20 and max 80), sex (male/female), family history of di-
of the knowledge learned by the black box model (from the abetes (yes/no), body mass index (BMI) (min 14 and max 47),
data) and embedded in the structure of the model (SVs in the waist circumference in centimeter (waist) (min 45 and max
case of SVMs) is revealed and provided to the end users in a 120), hip circumference (min 66 and max 125 cm), systolic
comprehensible form. blood pressure (min 90 and max 220), diastolic blood pressure
In this paper, we utilize two different techniques for rule (BPDIAS) (min 50 and max 120), cholesterol (min 1.9 and
extraction as the last module of our proposed approach, in par- max 8.5 mmol/L), fasting blood sugar (FBS) (min 55 and max
ticular SQRex-SVM [18] and the eclectic [19] methods are used 320 mg/dL), 2 h postglucose load (OGTT) min 38 and max
to turn the SVM black box into a more intelligible model. In 570 mg/dL).
the following subsections, a brief description of these methods The diabetes detection method used in this dataset is the
is introduced. OGTT, according to the 1985 World Health Organization
1) SQRex-SVM: A Sequential Covering Approach for Rule (WHO) criteria, where the subject is considered diabetic if their
Extraction: SQRex-SVM [18] extracts rules directly from a OGTT is equal to or greater than 200 mg/dL. Diabetic subjects
subset of the SVM SVs [see (2)], using a modified sequential who were taking oral hypoglycemic tablets or insulin (i.e., which
covering algorithm and based on an ordered search of the most were diagnosed as diabetic prior to the survey) were excluded
discriminatory features. These features are then ranked and uti- from the study. In addition, all subjects with missing values in
1116 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 14, NO. 4, JULY 2010
TABLE I TABLE II
DATASET RULE PERFORMANCE COMPARED TO THE SVM AND DIRECT RULE LEARNERS
AT EQUAL MISCLASSIFICATION COSTS
TABLE III
RULE TP, FP RATES, AND AUC COMPARED TO THE SVM AND DIRECT RULE
LEARNERS AT EQUAL MISCLASSIFICATION COSTS
TABLE IV
RULES AUC AT DIFFERENT MISCLASSIFICATION COSTS
Fig. 1. ROC curves for the eclectic rules and the SVM.
in AUC is not statistically significant (p > 0.05). Furthermore,
the rulesets extracted from SVMs have smaller number of rules,
hypothesis shows that there is no difference in measured AUC and therefore, improved comprehensibility. In addition, the dif-
between the two approaches cannot be rejected (p > 0.05). ference between the AUC of rules extracted from the SVM by
Furthermore, the differences in AUC between the SVM and the two approaches, as well as the difference of AUC between
those of the eclectic and the SQRex-SVM approaches are not each approach and the original SVM are not statistically signif-
statistically significant (p > 0.05). It should be noted that the icant (p > 0.05).
same training and testing sets described in Table I have been We have also compared our results with those of similar
used to train the SVM and direct rule learners, and the same test studies, e.g., [8], [9], where classification and regression tree
set is used to assess the quality of the SVM model, the extracted (CART) decision tree [27] has been used for the prediction of,
rules and the rules obtained by direct rule learners. or people at risk of, diabetes; it can be seen that our approach
The ROC curves for the performance of the SVM and the has obtained improved results. The best results obtained in [9]
extracted rules on the test set at different misclassification costs were 88%, 75%, and 85%, while best results obtained in [8]
are shown in Figs. 1 and 2, respectively. were 94.5%, 38%, and 73.6% for sensitivity, specificity, and
Comparing the ROC curves for the SVM and the extracted AUC, respectively.
rulesets, it can be seen that both curves follow the same pattern
with increasing misclassification cost. It can also be seen that
the ROC curves for both the eclectic and SQRex-SVM methods V. DIAGNOSTIC RULES VALIDATION AND INTERPRETABILITY
are almost identical to the SVM ROC curves, which is also After confirming the quality of the rules extracted by the two
reflected by their AUC. It should be noted here that the ROC methods and showing that they are providing a succinct explana-
curve had to be manually connected to the point (1,1) for the tion to the concepts learned by SVMs, we now look at a major
SVM and extracted rulesets (as no value of J ensured the SVM complementary measure of rule quality, rule interpretability,
always predicted positive). and whether these rules make sense to domain experts.
The AUC for these ROC curves and the associated standard In principle, the rules extracted by the two methods have
errors, as well as those direct rule learners are shown in Table IV. shown the correct and valid risk factors (features), which are
From this table, it can be shown that C5 has slightly higher used for the diagnosis and prediction of diabetes in their an-
AUC than rules extracted from SVM. However, this difference tecedent (namely, FBS, BPDIAS, and waist circumference).
1118 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 14, NO. 4, JULY 2010
A. SQRex-SVM Rules
It was shown in Section IV that SQRex-SVM extracts the best
rules in terms of rule quality from the machine learning point
of view, i.e., simplest and of best accuracy. It will therefore be Fig. 3. ROC curve for waist circumference as predictor for hyperglycemia.
interesting to see if these rules have same quality when evaluated
from the medical expert’s perspective.
The following are the rules extracted by SQRex-SVM at equal 1) Raised triglyceride level.
misclassification cost. 2) Reduced high-density lipoprotein cholesterol.
1) IF FBS >106.2 3) Raised blood pressure (systolic blood pressure ≥ 130 or
Then diabetic. BPDIAS ≥ 85 mm·Hg).
2) IF waist circumference ≥ 91 4) Raised fasting plasma glucose (FPG ≥ 100 mg/dL, or
and BPDIAS ≥ 90 previously diagnosed as type 2 diabetes).
Then diabetic. As mentioned earlier, central obesity is measured by waist
Default rule, nondiabetic. circumference, but it is also ethnicity specific [33].
If we consider the first rule, this rule is valid from the medical Comparing the IDF cutoff points for waist circumference with
perspective, as the diagnosis of diabetes can be solely done by the one obtained by the second rule, it can be seen that the cutoff
the FBS [1]. Considering the cutoff value of 106.2, this value is point we obtained for waist circumference is valid, as it is close
supported by a previous study [29], which used two datasets (one to the values defined by IDF [33] for different ethnic groups.
of them being the data used in this study) and concluded that It should also be noted that there is no cutoff value specifically
this value (106.2 mg/dL), even though it is lower than the cutoff defined for the Middle East (Arab) population [33]; and there is
value defined by the American Diabetes Association (ADA) and a recommendation to use the European data until more specific
WHO for the diagnosis of diabetes [1], is the value that gave data becomes available.
best AUC, sensitivity, and specificity for the Omani population, To further validate the second rule, we have plotted two ROC
hence, it is the value that best matches an OGTT ≥ 200 mg/dL. curves for the waist circumference as a predictor of the two
Al-Lawati and Barakat [29] have pointed out that if the ADA most common risk factors for metabolic syndrome namely, hy-
cutoff is applied, it would lead to underestimation of diabetes perglycemia and hypertension as shown in Figs. 3 and 4. The
by (18%) in the Omani population. Similar studies [30]–[32] waist circumference thresholds (cutoff values) used to plot these
have also shown that the ADA criteria would also underestimate ROC curves were <75, ≥75 and <80, ≥80 and <85, ≥85 and
diabetes in other populations. This rule can also be used to <91, and ≥91 cm.
discover undiagnosed diabetic subjects, missed by ADA and From these figures, it can be seen that waist circumference of
WHO criteria. 91 is a better predictor of each of these risk factors as compared
Considering the second rule, recent studies have shown an to lower waist circumference. This is more evident in males than
association between raised blood pressure (hypertension) and females (p < 0.05). We have also investigated the distribution of
central obesity, which is mainly measured by waist circum- diabetic subjects (as defined by the WHO criteria) over different
ference and the risk of developing diabetes and/or metabolic groups of waist circumferences. As Fig. 5 shows the distribution
syndrome [33]. A risk score for developing diabetes has been confirms our results, where the highest percentage of diabetic
proposed in [12], where hypertension contributed three points subjects have a waist circumference ≥91 cm, followed by the
and waist circumference contributed two points out of ten-point range of waist circumference ≥85 cm.
scale for all diabetes risks factors. Furthermore, subjects with Based on the discussion earlier, it can be concluded that the
metabolic syndrome also face an elevated risk of the develop- second rule can be used for opportunistic screening to identify
ing type 2 diabetes and CVDs [34]. The International Diabetes people at risk [12] (predicting diabetes). If indicated, further
Federation (IDF) defines metabolic syndrome as the presence investigation should be carried out and perhaps starting an early
of central obesity, plus any two of the following factors. intervention program to control the development of diabetes [3].
BARAKAT et al.: INTELLIGIBLE SUPPORT VECTOR MACHINES FOR DIAGNOSIS OF DIABETES MELLITUS 1119
males and females. However, there are other studies, which also
did not distinguish between males and females as well [35].
It is also worthwhile mentioning here that the average waist
circumference of males and females is the same in the survey
data, which explains the same cutoff value for both genders.
Fig. 5. Distribution of diabetic subjects over waist circumference groups. In this paper, we have developed a hybrid system for medical
diagnosis. In particular, we have employed SVMs for the diag-
nosis and prediction of diabetes, where an additional rule-based
Figs. 3–5 also validate our rule, which suggest that the cutoff explanation component is utilized to provide comprehensibility.
value of 91 cm can be used as a guiding or starting point for The SVM and the rules extracted from it, are intended to work as
the definition of central obesity in both males and females in a second opinion for diagnosis of and as a tool to predict diabetes
the Arabian Gulf area population. It can also be expected that through identifying people at high risk. According to domain
a cutoff value, which could be between 85 and 91 cm for both experts, the significance of our approach lies in its simplicity,
males and females is more suitable for defining central obesity comprehensibility, and validity. They find the rules produced
as compared to the cutoff value of Europeans suggested by the by the system really helpful, where simple measurements in
IDF [33]. However, more retrospective studies are needed to the outpatient could be used for opportunistic screening. This
decide the best cutoff value. would offer an enhanced opportunity for timely and appropriate
It should be noted that the cutoff values for FBS as well as intervention to take place, which would reduce or control the in-
the waist circumference in our rules, which mainly diagnose cidence of diabetes or its expensive complications. Furthermore,
diabetes were the same for both males and females, which is the rules can help in the detection of undiagnosed subjects.
also the case in the WHO and IDF. However, we have plotted Results show that our model is of high quality in terms of
separate ROC curves for males and females (see Figs. 3 and diagnostic and prediction accuracy, and outperforms other tech-
4), for predicting different risk of metabolic syndrome, where niques working on similar problems.
IDF standards distinguish between males and females regarding One of the potential future extensions of this work is
waist circumference cutoff values. Unlike the IDF standards, to conduct a prospective study to further refine the predic-
our results suggest that the waist circumference cutoff value for tive results obtained by the proposed rules. Specifically, this
predicting the risk of metabolic syndrome is the same for both could be achieved by following up the subjects with a waist
1120 IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 14, NO. 4, JULY 2010
circumference ≥91 cm and/or BPDIAS ≥90 for a 3 to 5 year [22] M. N. Barakat, N. Barakat, J. Diederich, and J. Al Lawati, “Diagnosis
period to see if, and when, they develop diabetes. Based on of diabetes mellitus: A data mining approach,” Int. J. Diabetes Metab.,
vol. 13, no. 1, p. 42, 2005.
the outcomes of such a study, a more sophisticated risk score [23] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining.
could be developed, which could significantly decrease health- New York: Addison-Wesley, 2005.
care costs via early prediction and diagnosis of diabetes. [24] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE:
Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16,
pp. 321–357, 2002.
REFERENCES [25] N. Japkowicz, “The class imbalance problem: Significance and strategies,”
in Proc. Int. Conf. Artif. Intell. ({IC}-{AI} 2000), pp. 111–117.
[1] WHO/IDF 2006 (2007, Jan.). Definition and diagnosis of dia-
[26] T. Joachims, Learning to Classify Text Using Support Vector Machines.
betes mellitus and intermediate hyperglycemia, World Health Or-
Norwell, MA: Kluwer, 2002.
ganisation [Online]. Available: http://www.who.int/diabetes/publications/
[27] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and
Definition%20and%20diagnosis%20of%20diabetes_new.pdf
Regression Trees. Monterrey, CA: Wadsworth and Brooks, 1984.
[2] International Diabetes Federation, Diabetes Atlas, 3rd ed. Brussels, Bel-
[28] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools
gium: International Diabetes Federation, 2007.
and Techniques, 2nd ed. San Francisco, CA: Morgan Kaufmann, 2005.
[3] M. Uusitupa, “Lifestyle matter in prevention of type 2 diabetes,” Diabetes
[29] J. A. Al-Lawati and M. N. Barakat, “Fasting cut-points in determining
Care, vol. 25, no. 9, pp. 1650–1651, 2002.
prevalence of diabetes in an Arab population of the middle east,” Diabetes
[4] M. Franciosi, G. D. Berardis, M. C. E. Rossi, and M. Sacco, “Use of
Res. Clin. Prac., vol. 75, no. 2, pp. 241–245, 2007.
the diabetes risk score for opportunistic screening and impaired glucose
[30] M. I. Harris, R. C. Eastman, C. C. Cowie, K. M. Flegal, and M. S.
tolerance,” Diabetes Care, vol. 28, no. 5, pp. 1187–1193, 2005.
Eberhardt, “Comparison of diabetes diagnostic categories in the U.S. pop-
[5] Y. Huang, P. McCullagh, N. Black, and R. Harper, “Feature selection
ulation according to the 1997 American Diabetes Association and 1980–
and classification model construction on type 2 diabetic patient’s data,”
1985 World Health Organization diagnostic criteria,” Diabetes Care,
in Lecture Notes Artificial Intelligence, vol. 3275, P. Perner, Ed. Berlin,
vol. 20, no. 12, pp. 1859–1862, 1997.
Germany: Springer-Verlag, 2004, pp. 153–162, ICDM 2004.
[31] P. W. Wahl, P. J. Savage, B. M. Psaty, T. J. Orchard, J. A. Robbins, and
[6] R. Bellazzi, C. Larizza, P. Magni, S. Montani, and M. Stefanelli, “Intelli-
R. P. Tracy, “Diabetes in older adults: Comparison of 1997 American
gent analysis of clinical time series: An application in the diabetes mellitus
Diabetes Association classification of diabetes mellitus with 1985 WHO
domain,” Artif. Intell. Med., vol. 20, pp. 37–57, 2000.
classification,” Lancet, vol. 352, no. 9133, pp. 1012–1015, 1998.
[7] R. Bellazzi, “Telemedicine and diabetes management: Current challenges
[32] J. E. Shaw, M. D. Courten, E. J. Boyko, and P. Z. Zimmet, “Impact of new
and future research directions,” J. Diabetes Sci. Technol., vol. 2, no. 1,
diagnostic criteria for diabetes on different populations,” Diabetes Care,
pp. 98–104, 2008.
vol. 22, no. 5, pp. 762–766, 1999.
[8] R. Goel, A. Misra, D. Kondal, R. M. Pandey, N. K. Vikram, J. S. Wasir,
[33] K. G. M. M. Alberti, P. Zimmet, and J. Shaw, “Metabolic syndrome—A
V. Dhingra, and K. Luthra, “Identification of insulin resistance in
new worldwide definition. A Consensus Statement from the International
Asian Indian adolescents: Classification and regression tree (CART) and
Diabetes Federation,” Diabet. Med., vol. 23, pp. 469–480, 2006.
logisticregression-based classification rules,” Clin. Endocrinol., vol. 70,
[34] S. Grundy, “Obesity, metabolic syndrome, and cardiovascular disease,”
pp. 717–724, 2009.
J. Clin. Endocrinol. Metab., vol. 89, no. 6, pp. 2595–2600, 2004.
[9] K. E. Heikes, B. Arondekar, D. M. Eddy, and L. Schlessinger, “Diabetes
[35] H. Wahrenberg, K. Hertel, B.-M. Leijonhufvud, L.-G. Persson, E. Toft, and
risk calculator, a simple tool for detecting undiagnosed diabetes and pre-
P. Arner, (2005). Use of waist circumference to predict insulin resistance:
diabetes,” Diabetes Care, vol. 31, no. 5, pp. 1040–1045, 2008.
Retrospective study, Brit. Med. J., viewed Jan. 2007. [Online]. Available:
[10] N. Lavrac, E. Keravnou, and B. Zupan, “Intelligent data analysis in
http://www.bmj.com/cgi/rapidpdf/bmj.38429.473310.AEv1
medicine,” in Encyclopedia of Computer Science and Technology, vol. 42,
A. Kent et al., Ed. New York: Marcel Dekker, 2000, pp. 113–157.
[11] W. Kong, L. Tham, K. Y. Wong, and P. Tan, “Support vector machine
approach for cancer detection using amplified fragment length polymor-
phism (AFLP) method,” presented at the the 2nd Asia-Pac. Bioinformatics
Conf. (APBC 2004), Dunedin, New Zealand. Nahla H. Barakat received the Ph.D. degree in computer science from the
[12] J. A. Al-Lawati and J. Tuomilehto, “Diabetes risk score in Oman: A tool University of Queensland, Brisbane, Australia.
to identify prevalent type 2 diabetes among Arabs of the middle east,” She is currently an Associate Professor in computer science with German
Diabetes Res. Clin. Pract., vol. 77, no. 3, pp. 438–444, 2007. University of Technology in Oman, Muscat, Oman. For more than ten years, she
[13] K. Morik, P. Brockhausen, and T. Joachims, “Combining statistical learn- has industry experience in the area of IT in a multinational environment. Her
ing with knowledge-based approach-A case study in intensive care moni- current research interests include machine learning and medical data mining.
toring,” in Proc. Eur. Conf. Mach. Learn., 1998, pp. 268–277.
[14] R. L. Ye and P. E. Johnson, “The impact of explanation facilities on user
acceptance of expert systems advise,” MIS Q., vol. 19, pp. 157–172, Jun.
1995.
[15] Z. Chen, J. Li, and L. Wei, “A multiple kernel support vector machine
scheme for feature selection and rule extraction from gene expression Andrew P. Bradley (SM’97) received the Ph.D. degree from the University of
data of cancer tissue,” Artif. Intell. Med., vol. 41, pp. 161–175, 2007. Queensland, Brisbane, Australia, in 1996.
[16] C. J. Wyatt and D. G. Altman, “Prognostic models: Clinically useful or Since 1996, he has been a Researcher in Australia, the United Kingdom,
quickly forgotten?” Brit. Med. J., vol. 311, pp. 1539–1541, 1995. and Canada. He is currently an Associate Professor in biomedical engineering
[17] N. Barakat, “Rule extraction from support vector machines: Medical di- with the University of Queensland. His research interests include biomedical
agnosis prediction and explanation,” Ph.D. thesis, School Inf. Technol. applications of pattern recognition in signal and image analysis.
Electr. Eng. (ITEE), Univ. Queensland, Brisbane, Australia, 2007. Dr. Bradley is a Chartered Electrical Engineer of the IET.
[18] N. Barakat and A. P. Bradley, “Rule extraction from support vector ma-
chines: A sequential covering approach,” IEEE Trans. Knowl. Data Eng.,
vol. 19, no. 6, pp. 729–741, Jun. 2007.
[19] N. Barakat and A. P. Bradley, “Rule extraction from support vector ma-
chines: Measuring the explanation capability using the area under the ROC Mohamed Nabil H. Barakat received the B.Sc. and M.Sc. degrees in internal
curve,” presented at the 18th Int. Conf. Pattern Recognit. (ICPR 2006), medicine from Alexandria University, Alexandria, Egypt and the M.Sc. degree
Hong Kong. in nephrology and metabolism from Cairo University, Cairo, Egypt.
[20] J. R. Quinlan, C4.5: Programs for Machine Learning. SanMateo, CA: He is a Consultant Physician and a Diabetologist with the Department
Morgan Kaufmann, 1993. of Noncommunicable Disease Surveillance and Control, Ministry of Health,
[21] M. Asfour, A. Lambourne, A. Soliman, S. Al-Behlani, D. Al-Asfoor, Muscat, Oman. He has a long clinical experience in treatment and management
and A. Bold, “High prevalence of diabetes mellitus and impaired glucose protocols of diabetes. He is involved in the national program of diabetes control
tolerance in the Sultanate of Oman: Results of the 1991 national survey,” with the Ministry of Health. His current research interests include diabetes and
Diabetic Med., vol. 12, pp. 1122–1125, 1995. cardiovascular diseases prevention and control techniques.