You are on page 1of 15

624 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO.

5, MAY 2010

Building a Rule-Based
Classifier—A Fuzzy-Rough Set Approach
Suyun Zhao, Student Member, IEEE, Eric C.C. Tsang, Degang Chen, and
XiZhao Wang, Senior Member, IEEE

Abstract—The fuzzy-rough set (FRS) methodology, as a useful tool to handle discernibility and fuzziness, has been widely studied.
Some researchers studied on the rough approximation of fuzzy sets, while some others focused on studying one application of FRS:
attribute reduction (i.e., feature selection). However, constructing classifier by using FRS, as another application of FRS, has been less
studied. In this paper, we build a rule-based classifier by using one generalized FRS model after proposing a new concept named as
“consistence degree” which is used as the critical value to keep the discernibility information invariant in the processing of rule
induction. First, we generalized the existing FRS to a robust model with respect to misclassification and perturbation by incorporating
one controlled threshold into knowledge representation of FRS. Second, we propose a concept named as “consistence degree” and by
the strict mathematical reasoning, we show that this concept is reasonable as a critical value to reduce redundant attribute values in
database. By employing this concept, we then design a discernibility vector to develop the algorithms of rule induction. The induced
rule set can function as a classifier. Finally, the experimental results show that the proposed rule-based classifier is feasible and
effective on noisy data.

Index Terms—Knowledge-based systems, fuzzy-rough hybrids, rule-based classifier, IF-THEN rule.

1 INTRODUCTION

R EAL-WORLD applications in the areas of artificial intelli-


gence, such as pattern recognition and machine learn-
ing, require that their tools be able not only to reduce
features. This was a two-step process in which each of the
steps was independent. Its first step used RS to select
features and the second step built a rule-based classifier by
dimensionality of large databases, but also build classifiers extracting some fuzzy rules from a perceptron network
on classification problems. The fuzzy-rough set (FRS) classifier in the reduced space.
methodology [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], The second category uses RS or FRS as part of classifier-
[12], [13], [14], [15], [16], [17] is one widely studied class of designing tool [39], [40], [41]. The fuzzy-rough classification
tool for reducing database dimensionality (also known as tree classifier [41], for example, used fuzzy-rough hybrids to
attribute reduction) and building classifiers [18], [19], [20], qualify the dependency of decision attributes in the decision
[21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], tree generation mechanism.
[33]. There have been some approaches proposed that used The third category uses fuzzy-rough hybrids to construct
FRS to build classifiers but most of the resultant classifiers classifiers. It does so without the assistance of other classifiers
either required the support of other classifiers or had not [42], [43], [44], [45]. However, the resultant classifiers in this
been shown to have a mathematical foundation or were not category have some observable limitations. For example, the
robust to misclassification or perturbation [34], [35], [36], classifier designed using RS in [42], [43] could only operate
[37], [38], [39], [40], [41], [42], [43], [44], [45]. effectively on the data sets containing discrete values and
The approaches of classifier building with either Rough they needed perform a discretization step beforehand on the
Sets (RSs) or FRS can be roughly divided into three data sets containing real-valued features. Although Wang
categories. The first category has been to use FRS (or RS) et al. [45] used RS to design a rule-based classifier without
as a preprocessing tool [35], [36], [37], [38]. For example, in discretization and performed well on some data sets, they
did not consider the theoretical structures of the lower and
[35], FRS was not used to induce rules, but merely to reduce
upper approximations, such as topologic and algebraic
properties, or use them in knowledge discovery. That is,
. S. Zhao is with the School of Mathematics and Computer Sciences, Hebei there exist one gap between the existing FRS methodology
University, Baoding, China. E-mail: zsyrose2007@yahoo.com.cn. and its application of classifier building.
. E.C.C. Tsang is with the Faculty of Information Technology, The Macau In this paper, the focus is on narrowing this gap by
University of Science and Technology, Taipa, Macau. building a classifier using the FRS methodology. However,
E-mail: cctsang@must.edu.mo.
. D. Chen is with the Department of Mathematics and Physics, North China we find that it is not an effective selection of building
Electric Power University, China. E-mail: chengdegang@263.net. classifier using the existing FRS since it was not a robust
. X. Wang is with the School of Mathematics and Computer Sciences, Hebei model in the real applications. Because we have found that
University, Baoding, China. E-mail: xizhaowang@ieee.org. the existing FRS was sensitive to misclassification and
Manuscript received 14 Mar. 2008; revised 17 Dec. 2008; accepted 11 Apr. perturbation [44]. Therefore, it is necessary to design a
2009; published online 29 Apr. 2009. robust classifier based on the FRS methodology.
Recommended for acceptance by B.C. Ooi.
For information on obtaining reprints of this article, please send e-mail to:
In this paper, we propose a rule-based classifier by using a
tkde@computer.org, and reference IEEECS Log Number TKDE-2008-03-0145. generalized FRS framework. First, we generalize FRS to a
Digital Object Identifier no. 10.1109/TKDE.2009.118. robust model named as Generalized Fuzzy-rough Sets
1041-4347/10/$26.00 ß 2010 IEEE Published by the IEEE Computer Society
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 625

(denoted by GFRS), which is used as the foundation to build a 2.1 Fuzzy Decision Table
rule-based classifier. Second, we propose the consistence Let U be a nonempty set with finite objects. With every
degree as a critical point to reduce attribute values and the object, we associate a set of condition attributes R and a set
strict mathematical reasoning shows that this concept is of decision attributes D. The pair ðU; R [ DÞ is called a
reasonable by keeping it invariant to reduce attribute values. decision table, denoted by DS. If some attributes in DS are
Further, we propose a discernibility vector to find all fuzzy attributes, the decision table is called fuzzy decision
attribute value reductions and investigate the structure of table, denoted by F D ¼ ðU; R [ DÞ. Here, fuzzy attributes
attribute value reductions. The above proposed concepts, i.e., mean the attributes with real number values since these
consistence degree and discernibility vector, are the main values can be transferred to fuzzy values. For simplicity, we
contribution of this paper. Forth, we develop some heuristic still use R to represent the set of fuzzy attributes.
algorithms, which achieve the near-optimal attribute value In a fuzzy decision table F D ¼ ðU; R [ DÞ, aðxÞ repre-
reductions and near-minimal rule set, to build a rule-based sents the value of x 2 U on the attribute a and P ðxÞ ¼
classifier. Finally, we compare our proposed method with faðxÞ : a 2 P  Rg represents the subset of attribute values
of x 2 U on attribute subset P  R. With every P  R, we
other rule-based classifiers: RS, FRS, and fuzzy decision tree.
associate a binary relation P ðx; yÞ called fuzzy similarity
The experimental results show the feasibility and effective-
relation of P , which is a binary relation satisfying reflexivity
ness of our proposed method on noisy data. (P ðx; xÞ ¼ 1), symmetry (P ðx; yÞ ¼ P ðy; xÞ), and T  tran-
The rest of this paper is organized as follows: Section 2
sitivity (P ðx; yÞ  T ðP ðx; zÞ; P ðz; yÞÞ) for every x; y; z 2 U.
reviews FRS. Section 3 generalizes fuzzy approximation Actually, when the attribute values are symbolic, the fuzzy
operators to a robust model named as GFRS. Section 4 then similarity relation degenerates to an equivalence relation,
proposes the concepts of attribute value reduction based on which generates a partition on U, denoted by U=P ¼
GFRS and designs a discernibility vector to investigate the f½xP j x 2 Ug, where ½xP ¼ fy 2 U j aðxÞ ¼ aðyÞ; 8a 2 P g is
structure of attribute value reductions. Also, it proposes a the equivalence class containing x 2 U.
rule induction algorithm. In Section 5, the experimental In most practical applications, condition attributes often
comparisons between our proposed method and other rule- have real number values, while decision attributes have
based classifiers are given. Section 6 concludes this paper. symbolic values. This type of fuzzy decision table with
condition fuzzy attributes and decision symbolic attributes
is thus acted as the platform to design the rule-based
2 REVIEW OF FUZZY-ROUGH SETS classifier by using FRS in this paper.
Theories of RSs and fuzzy sets (FSs) are distinct and
complementary to handle vagueness and uncertainty [4], 2.2 Fuzzy-Rough Sets
[16]. RS described the idea of indiscernibility between The concept of FRS was first proposed by Dubois and Prade
objects in a set, while FS modeled the ill-definition of the [3], [4], their idea was as follows: Let U be a nonempty
boundary of a subclass of this set. That is to say, they were universe and R a fuzzy binary relation on U, F ðUÞ the fuzzy
not rival theories but captured two distinct aspects of power set of U. A fuzzy-rough set is a pair ðR ðF Þ; R ðF ÞÞ of
imperfection in knowledge [4]. Many researchers were then a fuzzy set F on U such that for every x 2 U,
inspired to combine them to handle indiscernibility and
R ðF ÞðxÞ ¼ infy2U maxf1  Rðx; yÞ; F ðyÞg;
fuzziness [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12],
[13], [14], [15], [16], [17]. The results of these studies led to R ðF ÞðxÞ ¼ supy2U minfRðx; yÞ; F ðyÞg:
the instruction of the notions of FRS.
In [10], the above Dubois and Prade fuzzy-rough set was
Generally speaking, FRS was composed of two parts.
generalized from Max; Min to a residuated implicator #
One was knowledge representation (i.e., the rough approx-
and a general triangular norm T with respect to a fuzzy
imation of fuzzy sets), while the other was knowledge
similarity relation R. The lower and upper approximation
discovery (i.e., attribute reduction and rule induction). Most
operators of a fuzzy set A are defined as for every A 2 F ðUÞ,
researchers focused on knowledge representation of FRS by
using two approaches: the constructive and axiomatic [3], RAðxÞ ¼ infu2U #ðRðu; xÞ; AðuÞÞ;
[4], [5], [6], [7], [8], [9], [10], [11], [14]. The axiomatic
RAðxÞ ¼ supu2U T ðRðu; xÞ; AðuÞÞ:
approach took the fuzzy approximation operators as the
primary notion and focused on studying the mathematical However, another upper approximation operator was
structure of FRS, such as algebraic and topologic structures. proposed to obtain the dual approximation operator of RA
On the contrary, the constructive approach took the since RA and RA were not dual with each other [5].
replacements of the equivalence relation, such as fuzzy Suppose S is the dual triangular conorm of T , define
binary relation, fuzzy T-similarity relation, and fuzzy weak RðAÞðxÞ ¼ supy2U ð1  Rðx; yÞ; AðyÞÞ as another kind of
equivalence partition, as the primary notion. Other notions, upper approximation of a fuzzy set A.
such as the lower and upper approximations, were The aforementioned approximation operators can be
constructed by using the replacements. summarized as the following four general fuzzy approx-
The summarization of FRS has been done in [1], [9], [13], imation operators. For every A 2 F ðUÞ,
[23] and interested readers can refer to them. In this paper,
we just review some researches of FRS summarized in [9], . T —upper approximation operator:
[23], which are helpful for us to build a rule-based classifier.
As preliminaries, a fuzzy decision table used as the study RT AðxÞ ¼ supu2U T ðRðx; uÞ; AðuÞÞ;
platform of FRS is formulated as follows:

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
626 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

show that two conditions RT x  A and R x  A are


sensitive to misclassification and perturbation. As a result,
the knowledge representation power of FRS is weak.

3 GENERALIZATION OF FUZZY-ROUGH SETS


(GFRSs)
To make FRS less sensitive to misclassification and perturba-
tion, we generalize the existing fuzzy approximation
Fig. 1. The demonstration graph of the sensitivity of the existing FRS.
operators as membership function representation and
granular representation. We then prove that these two
. S—lower approximation operator: representations are equivalent. The membership representa-
tion of the fuzzy approximation operators in FRS is general-
RS AðxÞ ¼ infu2U SðNðRðx; uÞÞ; AðuÞÞ; ized by introducing a threshold, described as follows:
Definition 3.1 (membership function representation). For
. —upper approximation operator: every A 2 F ðUÞ and a given threshold  2 ½0; 1Þ, fuzzy lower
R AðxÞ ¼ supu2U ðNðRðx; uÞÞ; AðuÞÞ; and upper approximations are, respectively, defined as follows:

R#  AðxÞ ¼ inf #ðRðx; uÞ; Þ


. #—lower approximation operator: AðuÞ
ðD3:1:1Þ
^ inf #ðRðx; uÞ; AðuÞÞ; 8x 2 U;
R# AðxÞ ¼ infu2U #ðRðx; uÞ; AðuÞÞ: AðuÞ>

This type of definition is seen as the membership


RT  AðxÞ ¼ sup T ðRðx; uÞ; NðÞÞ
function representation of fuzzy approximation operators AðuÞNðÞ
since they are defined by using membership function. ðD3:1:2Þ
_ sup T ðRðx; uÞ; AðuÞÞ; 8x 2 U;
From the viewpoint of granular computing, RT A and AðuÞ<NðÞ
R# A, and R A and RS A are two pairs of approximation
operators, respectively, since they can be represented by RS  AðxÞ ¼ inf SðNðRðx; uÞÞ; Þ
their individual fuzzy granules: RT x and R x , where a AðuÞ
fuzzy point x is defined as [8] ðD3:1:3Þ
^ inf SðNðRðx; uÞÞ; AðuÞÞ; 8x 2 U;
 AðuÞ>
; z ¼ x;
x ðzÞ ¼ 8z 2 U:
0; z 6¼ x;
R  AðxÞ ¼ sup ðNðRðx; uÞÞ; NðÞÞ
Their granular representation is presented as follows [8]: AðuÞNðÞ
ðD3:1:4Þ
_ sup ðNðRðx; uÞÞ; AðuÞÞ; 8x 2 U:
R# A ¼ [fRT x : RT x  Ag; RT A ¼ [fRT x : x  Ag; AðuÞ<NðÞ

RS A ¼ [fR x : R x  Ag; R A ¼ [fR x : x  Ag:


In this paper, we focus on the pair of approximation
It is easy to see that there are two conditions RT x  A operators: RS  A and R  A. The properties and applications
and R x  A in the granular representation of fuzzy of another pair of approximation operators RT  A and R#  A
approximation operators. These conditions control which have been studied in another work [46].
fuzzy sets RT x and R x can be included in the fuzzy lower Here, we take (D 3.1.3) as an example to show the
approximation operators R# A and RS A. Small perturbation difference between FRS and GFRS, which is helpful for
of RT x and R x will change the result of R# A and RS A. readers to understand Definition 3.1. In FRS, the S-lower
These show that the conditions RT x  A and R x  A are approximation operator is defined as
too harsh to construct the fuzzy lower approximation
RS AðxÞ ¼ infu2U SðNðRðx; uÞÞ; AðuÞÞ:
operators R# A and RS A, and then, the existing FRS is
sensitive to misclassification and perturbation. It can be equivalently represented as (D’3.1.3):
In Fig. 1, we use one demonstration graph to show the
sensitivity of the existing FRS. In R x  A or RT x  A, RS AðxÞ ¼ inf SðNðRðx; uÞÞ; AðuÞÞ
AðuÞ
R x or RT x can be represented by a fuzzy set (e.g., f 1 , f 2 , ðD0 3:1:3Þ
and f 3 in Fig. 1) and A can be represented by a rectangle. ^ inf SðNðRðx; uÞÞ; AðuÞÞ;
AðuÞ>
In Fig. 1, we find that there are three kinds of inclusion
relations: f 1 is included in A, most part of f 2 is included in
A, and a small part of f 3 is included in A. In fact, f 2 should RS  AðxÞ ¼ inf SðNðRðx; uÞÞ; Þ
AðuÞ
be included in A if we ignore some small membership ðD3:1:3Þ
degrees of f 2 (these small membership degrees may be ^ inf SðNðRðx; uÞÞ; AðuÞÞ:
AðuÞ>
caused by misclassification and perturbation). All these

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 627

respectively. We verify this fact by using the following


theorems. As preliminaries, a lemma about fuzzy granular
R  x is described as follows:
Lemma 3.1. Let

 ¼ ð[fR x : ðR x Þ  AgÞðxÞ ðL3:1:1Þ

for x 2 U, then the following statements hold: ðR x Þ 


A; ðR x Þ  A does not hold for any  > .
Fig. 2. The demonstration graph of the less sensitivity of GFRS.
[ fR x : x  ANðÞ g ¼ [fR x0 : 0 ¼ ANðÞ ðxÞg:
ðL3:1:2Þ
Comparing (D 3.1.3) and (D’3.1.3), we find that in GFRS,
a threshold replaces the membership degree AðuÞ when it Proof. (L3.1.1) By  ¼ ð[fR x : ðR x Þ  AgÞðxÞ, there
meets the condition AðuÞ  . By controlling this threshold,
exist t 2 ð0; 1 and y 2 U satisfying R yt ðxÞ ¼  and
some misclassification and perturbation can be ignored. In
ðR yt Þ  A, where R yt ðxÞ ¼ ðNðRðy; xÞÞ; tÞ. By
the following, we would like further explain this point by
ðR yt Þ  A, we obtain if ðNðRðy; zÞÞ; tÞ > , then
using the generalized granular representation.
ðNðRðy; zÞÞ; tÞ  AðzÞ for any z 2 U.
The generalization of the granular representation is
obtained by incorporating a concept of fuzzy cut set to the 8z 2 U; R x ðzÞ ¼ ðNðRðx; zÞÞ; Þ ¼ ðNðRðx; zÞÞ;
granular representation of RS A and R A. The fuzzy -cut ðNðRðy; xÞÞ; tÞÞ ¼ ðSðNðRðx; zÞÞ; NðRðy; xÞÞÞ; tÞ
set is defined as
 ðNðRðy; zÞÞ; tÞ:

 fðxÞ; fðxÞ > ; Thus, we have if ðNðRðy; zÞÞ; tÞ > , then
f ðxÞ ¼
0; fðxÞ  ;
ðNðRðx; zÞÞ; Þ  AðzÞ;
here, f is the fuzzy set defined on U.
Definition 3.2 (Granular representation). For every A 2 i.e., ðR x Þ  A holds.
F ðUÞ and a given threshold  2 ½0; 1Þ, fuzzy lower and upper Assume that ðR x Þ  A for  > , then by  ¼
approximations are, respectively, defined as follows: ð[fR x : ðR x Þ  AgÞðxÞ, we get   R x ðxÞ ¼ ,
which contradicts the assumption  > . Thus,
RS  A ¼ [fR x : ðR x Þ  Ag; ðD3:2:1Þ ðR x Þ  A does not hold for  > .
(L 3.1.2). If x  ANðÞ , then   ANðÞ ðxÞ. Since ð; Þ
R  A ¼ [fR x : x  ANðÞ g; where is monotonically increasing in the right argument, we

NðÞ; AðxÞ  NðÞ; ðD3:2:2Þ have ðR x ¼ ðNðRðx; zÞÞ; ÞÞ  ðR x0 ¼ ðNðRðx; zÞÞ;
ANðÞ ¼
AðxÞ; otherwise: 0 Þ for x  ANðÞ and 0 ¼ ANðÞ ðxÞ. Thus, ð[fR x :
x  ANðÞ gÞ  ð[fR x0 : 0 ¼ ANðÞ ðxÞgÞ. It is straight-
Definition 3.2 shows that a fuzzy -cut set is introduced in forward to see that ð[fR x : x  ANðÞ gÞ ð[fR x0 :
the granular representation, and then, the condition R x  0 ¼ ANðÞ ðxÞgÞ. He nce , ð[fR x : x  ANðÞ gÞÞ ¼
A in FRS is relaxed by ðR x Þ  A in GFRS. This condition ð[fR x0 : 0 ¼ ANðÞ ðxÞgÞ. u
t
ðR x Þ  A is less sensitive to misclassification and
In the following, we describe and prove two theorems
perturbation since small perturbation of R x will not
which show that the membership function representation
change the result of RS  A. In the following, we use one
and granular representation of RS  A and R  A are
demonstration graph to illustrate this point (see Fig. 2).
equivalent.
As compared to Fig. 1, Fig. 2 cuts those fuzzy sets f 1 , f 2 ,
and f 3 by a gray rectangle. Fig. 2 shows that after we cut Theorem 3.1. For every fuzzy set A 2 F ðUÞ, the following two
some small membership degrees, f 1 and f 2 are included in formulas are equal:
A, but f 3 is not included in A. This shows that when we
inf SðNðRðx; uÞÞ; Þ ^ inf SðNðRðx; uÞÞ; AðuÞÞ; 8x 2 U;
ignore some small misclassification and perturbation, some AðuÞ AðuÞ>
more fuzzy sets are chosen to compute the fuzzy lower
ðT3:1:1Þ
approximation value. Thus, the fuzzy lower approximation
value becomes larger, and then, the fuzzy positive region
becomes larger. That is to say, the knowledge representa- [ fR x : ðR x Þ  Ag: ðT3:1:2Þ
tion power becomes stronger in GFRS. All these show that
GFRS is a robust framework with respect to perturbation Proof. 8x 2 U, we need to prove
and misclassification. inf SðNðRðx; uÞÞ; Þ ^ inf SðNðRðx; uÞÞ; AðuÞÞ
The membership function and granular representations AðuÞ AðuÞ>
of RS  A and R  A are equivalent, respectively, i.e., (D 3.1.3) ¼ [fR x : ðR x Þ  AgðxÞ:
and (D 3.2.1), and (D 3.1.4) and (D 3.2.2) are equivalent,

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
628 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

ð[fR u0 : 0 ¼ ANðÞ ðuÞgÞðxÞ ¼ supfR u0 ðxÞ :


Let  ¼ [fR x : ðR x Þ  AgðxÞ, then ðR x Þ  A,
but ðR x Þ  A does not hold for  >  (by Lemma 3.1). 0 ¼ ANðÞ ðuÞg
  
Thus, NðÞ; AðuÞ  NðÞ
since ANðÞ ðxÞ ¼
AðuÞ; AðuÞ
8u 2 U; ðR x Þ ðuÞ  AðuÞ ¼ sup fR uNðÞ ðxÞg _ sup fR uAðuÞ ðxÞg
 AðuÞNðÞ AðuÞ<NðÞ
ðNðRðx; yÞÞ; Þ  AðuÞ; AðuÞ > 
, , ¼ sup ðNðRðx; uÞÞ; NðÞÞ
ðNðRðx; yÞÞ; Þ  ; AðuÞ   AðuÞNðÞ
8
> inffc 2 ½0; 1; SðNðRðx; yÞÞ; cÞ  g  AðuÞ;
>
> _ sup ðNðRðx; uÞÞ; AðuÞÞ:
< AðuÞ >  AðuÞ<NðÞ
, ,
>
> inffc 2 ½0; 1; SðNðRðx; yÞÞ; cÞ  g  ; Hence, 8x 2 U,
>
:
AðuÞ  
 sup ðNðRðx; uÞÞ; NðÞÞ
SðNðRðx; yÞÞ; AðuÞÞ  ; AðuÞ > ; AðuÞNðÞ
SðNðRðx; yÞÞ; Þ  ; AðuÞ  :
_ sup ðNðRðx; uÞÞ; AðuÞÞ
AðuÞ<NðÞ
Let
¼ ð[fR x : x  ANðÞ gÞðxÞ:
 ¼ inf SðNðRðx; uÞÞ; Þ ^ inf SðNðRðx; uÞÞ; AðuÞÞ; t
u
AðuÞ AðuÞ>
Theorems 3.1 and 3.2 show that the membership
then we get   . Assume that  > , then ðR x Þ  A. representation and granular representation of RS  A and
Thus, 9u0 , R  A are equivalent. They are the key theorems for us to
build the rule-based classifier using GFRS since they are the
ðR x Þ ðu0 Þ > Aðu0 Þ theoretical foundation to design the discernibility vector
 approach. Without them, the discernibility vector cannot be
ðNðRðx; yÞÞ; Þ > Aðu0 Þ; Aðu0 Þ > 
, , designed in GFRS.
ðNðRðx; yÞÞ; Þ > ; Aðu0 Þ  

SðNðRðx; yÞÞ; Aðu0 ÞÞ < ; Aðu0 Þ > ;
, 4 BUILDING A RULE-BASED CLASSIFIER
SðNðRðx; yÞÞ; Þ < ; Aðu0 Þ  :
This section aims to develop a methodology for building a
We get rule-based classifier from a fuzzy decision table by using
GFRS. This methodology is composed of two parts: first
 ¼ inf SðNðRðx; uÞÞ; Þ attribute value reduction, and then, rule induction from the
AðuÞ reduced decision table. Here, the rules are IF-THEN
production rules.
^ inf SðNðRðx; uÞÞ; AðuÞÞ < :
AðuÞ>
4.1 Attribute Value Reduction
It contradicts the assumption  > . Thus, we get The consistence between two objects means that the objects
with similar condition attributes belong to the same
 ¼ , i.e., decision classes. In the following, we present a theorem
which describes in what condition two objects are consis-
8x 2 U; inf SðNðRðx; uÞÞ; Þ ^ inf SðNðRðx; uÞÞ; AðuÞÞ tent in a fuzzy decision table.
AðuÞ AðuÞ>
Theorem 4.1. Given two objects x and y (y 6¼ x) in F D ¼

¼ [fR x : ðR x Þ  AgðxÞ: ðU; R [ DÞ, if NðRðx; yÞÞ < RS ½xD ðxÞ, then ½xD ðyÞ ¼ 1.
t
u Proof. We prove it by contradiction. Assume that ½xD ðyÞ ¼
0, t he n RS ½xD ðxÞ ¼ inf u2U SðNðRðx; uÞÞ; ½xD ðuÞÞ 
Theorem 3.2. For every A 2 F ðUÞ, the following two formulas SðNðRðx; yÞÞ; 0Þ ¼ NðRðx; yÞÞ. This contradicts the given
condition, and then, we get ½xD ðyÞ ¼ 1. u
t
are equal:
Theorem 4.1 shows that if the distance between two
sup ðNðRðx; uÞÞ; NðÞÞ objects is smaller than a certain value RS ½xD ðxÞ (NðRðx; yÞÞ
AðuÞNðÞ
ðT3:2:1Þ can be seen as one kind of distance between the objects x
_ sup ðNðRðx; uÞÞ; AðuÞÞ; 8x 2 U; and y), then these objects belong to the same decision
AðuÞ<NðÞ classes. If the distance between two objects is bigger than or
equal to this certain value, then they may belong to different
decision classes. That is, this certain value is the critical
[ fR x : x  ANðÞ g: ðT3:2:2Þ value to keep two objects consistent. Since misclassification
and perturbation are available in a fuzzy decision table, this
Proof. By Lemma 3.1, we get [fR x : x  ANðÞ g ¼ critical value should be enlarged to adjust the robust
[fR x0 : 0 ¼ ANðÞ ðxÞg. Then, we have framework of FRS.

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 629

In this paper, we enlarge the critical value to RS  ½xD ðxÞ Proof. If ½zD ¼ ½xD , it is straightforward to get
since it is the enlarged value of RS ½xD ðxÞ in the proposed ðR x Þ  ½xD . If ½zD 6¼ ½xD , we have R x ðxÞ   for
robust framework GFRS, we then obtain the following x 62 ½zD , i.e.,   . When   , we get ððR x Þ ¼
statement “if the distance between the objects x and y are Þ  ½xD . u
t
smaller than the value RS  ½xD ðxÞ, then these two objects Lemma 4.2. For two T -similarity relations R and P , if
are consistent if we ignore some inconsistent objects caused 8x; u 2 U; Rðx; uÞ  P ðx; uÞ, then the following statements
by misclassification or perturbation. That is to say, if hold: (L 4.2.1) R#  A P#  A; (L 4.2.2) RT  A  PT  A.
NðRðx; yÞÞ < RS  ½xD ðxÞ, then ½xD ðyÞ ¼ 1; if NðRðx; yÞÞ 
Proof.
RS  ½xD ðxÞ, then ½xD ðyÞ ¼ 0 may happen. It is clear that
RS  ½xD ðxÞ is the critical value to keep the consistence of the 8x; u 2 U; Rðx; uÞ  P ðx; uÞ ) inf #ðRðx; uÞ; Þ
object x with other objects if we ignore some misclassifica- AðuÞ
tion and perturbation. Thus, we define the consistence ^ inf #ðRðx; uÞ; AðuÞÞ  inf #ðP ðx; uÞ; Þ ðL4:2:1Þ
degree of an object in a fuzzy decision table as follows: AðuÞ> AðuÞ

Definition 4.1 (consistence degree). Given an object x in ^ inf #ðP ðx; uÞ; AðuÞÞ
AðuÞ>
F D ¼ ðU; R [ DÞ, let

ConR; ðDÞðxÞ ¼ RS  ½xD ðxÞ; ConR; ðDÞðxÞ


(By the antimonotonicity of #ð; Þ in the left argument)
is then called the consistence degree of x in F D ¼ ðU; R [ DÞ. ) R#  A P#  A.

8x; u 2 U; Rðx; uÞ  P ðx; uÞ ) sup T ðRðx; uÞ; 1  Þ


Definition 4.1 shows that the consistence degree of each AðuÞ1
object gets its value at the lower approximation of its
^ sup T ðRðx; uÞ; AðuÞÞ
corresponding decision class. AðuÞ<1
Keeping the consistence degree invariant, some redun-
 sup T ðP ðx; uÞ; 1  Þ ^ sup T ðP ðx; uÞ; AðuÞÞ
dant attribute values can be reduced. That is to say, the AðuÞ1 AðuÞ<1
reduction of attribute values does not cause new indiscern- ðL4:2:2Þ
ibility information. This allowed the fuzzy decision table to
retain the main information. In the following, we use this (By the monotonicity of T ð; Þ in both arguments)
idea in proposing the concept of attribute value reduction in ) RT  A  PT  A. u
t
F D ¼ ðU; R [ DÞ.
Definition 4.2 (attribute value reduction). Given an object x Theorem 4.2. Given x 2 U and P  R in F D ¼ ðU; R [ DÞ,
in F D ¼ ðU; R [ DÞ, if the subset P  R satisfies the the following two statements are equivalent:
following two statements:
. (T 4.2.1) P ðxÞ contains the attribute value reduction
ConR; ðDÞðxÞ ¼ ConP ; ðDÞðxÞ; ðD4:2:1Þ of x.
. (T 4.2.2) The formula ðNðP ðx; yÞÞ; Þ   always
8b 2 P ; ConR; ðDÞðxÞ > ConfP fbgg; ðDÞðxÞ: ðD4:2:2Þ holds for every y 62 ½xD , where  ¼ RS  ½xD ðxÞ.
Proof. (T 4.2.1) ) (T 4.2.2): Assume that P ðxÞ contains the
Then, P ðxÞ ¼ faðxÞ : a 2 P g is called the attribute value attribute value reduction of x, then ConR; ðDÞðxÞ ¼
reduction of x 2 U. RS  ½xD ðxÞ ¼ ConP ; ðDÞðxÞ ¼ PS  ½xD ðxÞ holds. Let  ¼
The attribute value aðxÞ 2 P ðxÞ  RðxÞ is dispensable in RS  ½xD ðxÞ, then PS  ½xD ðxÞ ¼ . By Lemma 4.1, ðP x Þ 
P ðxÞ if ConR; ðDÞðxÞ ¼ ConP fag; ðDÞðxÞ; otherwise, it is ½xD holds. Thus, ðNðP ðx; yÞÞ; Þ   for every y 62 ½xD .
indispensable. If all attribute values in P ðxÞ are indispen- (T 4.2.2) ) (T 4.2.1): By the condition  ¼ RS  ½xD ðxÞ,
sable for the object x, then P ðxÞ is independent for x. we get   PS  ½xD ðxÞ by Lemma 4.2.
Definition 4.3 (attribute value core). Given an object x in By the condition ðNðP ðx; yÞÞ; Þ   for every
F D ¼ ðU; R [ DÞ, the collection of the indispensable attribute y 62 ½xD , we have ðP x Þ  ½xD . By the granular repre-
values in RðxÞ is the attribute value core of x, denoted by sentation of RS  A, ðP x ÞðxÞ ¼   PS  ½xD ðxÞ holds.
Core ðxÞ. Thus,  ¼ PS  ½xD ðxÞ ¼ RS  ½xD ðxÞ, i.e., ConR; ðDÞðxÞ ¼
ConP ; ðDÞðxÞ. By Definition 4.2, P ðxÞ contains the attri-
Definitions 4.2 and 4.3 show that the proposed attribute bute value reduction of the object x. u
t
value reduction and core have similar key idea of attribute By Theorem 4.2, we construct discernibility vector as
value reduction and core in RS. That is, attribute value follows:
reduction is the minimal subset keeping the discernable Suppose U ¼ fx1 ; x2 ; x3 ; . . . ; xn g, by V ectorðU; R; D; Þ,
information of one object invariant, attribute value core is the we denote an n
1 vector ðcj Þ, called the discernibility
collection of those important attribute values in an object. vector of the object xi , such that
In the following, we study the structure of attribute value
cj ¼ fa : ðNðaðxi ; xj ÞÞ; Þ  g; for Dðxi ; xj Þ ¼ 0; ðV4:1Þ
reductions and core. As preliminaries, we present two
lemmas as follows: where  ¼ RS  ½xi D ðxi Þ,
Lemma 4.1. In a fuzzy decision table F D ¼ ðU; R [ DÞ, if
cj ¼ ; for Dðxi ; xj Þ ¼ 1: ðV4:2Þ
ðR x Þ  ½zD , then ðR x Þ  ½xD .

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
630 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

The proposed discernibility vector is similar with one following, we present the detail algorithm to find all value
column of discernibility matrix in [47]. cj is the collection of reductions by using the discernibility vector.
attribute maintaining ðNðaðxi ; xj ÞÞ; Þ  , each attribute We now design an algorithm to compute all attribute
in which can distinguish xi from xj . value reductions for one object by using the discernibility
The following theorems show that by using discernibility vector approach. For xi 2 U in F D ¼ ðU; R [ DÞ, the algo-
vector, we can find the value reductions and core. rithm to find all attribute value reductions of xi is described
Theorem 4.3. Core ðxÞ ¼ faðxÞ : cj ¼ fagg for some 1 j  n. in Algorithm 4.1. This algorithm can find all attribute value
Proof. reductions of one object, but its computational complexity
increases exponentially with the number of attributes (i.e.,
aðxÞ 2 Core ðxÞ , ConR; ðDÞðxÞ > ConfRfagg; ðDÞðxÞ OðjU j
2jAj Þ, here, jU j is the number of objects and jAj is the
, RS  ð½xD ÞðxÞ > ðR  fagÞS ð½xD ÞðxÞ , : number of attributes). In practical applications, it is not

necessary to find all value reductions because finding a near-
optimal one is enough. We then provide a heuristic
Let  ¼ RS  ð½xD ÞðxÞ and t ¼ ðR  fagÞS ð½xD ÞðxÞ, then algorithm (see Algorithm 4.2) to find one near-optimal

these four formulas
attribute value reduction by rewriting the part covered by
 > t; R ðxÞ  ½xD ; ðR  fagÞ ðxÞt  ½xD ; dot-lined box in Algorithm 4.1. The key idea of this algorithm
uses the add-deletion strategy of reduction algorithm
and ðR  fagÞ ðxÞ 6 ½xD hold (By Lemma 4.1) , there construction described in [48].
exists y 2 U(y 6¼ x) satisfying ðNððR  fagÞðx; yÞÞ; Þ > Algorithm 4.1: to find all attribute value reductions of xi
 and ðNððR  fagÞðx; yÞÞ; tÞ   , ðNðbðx; yÞÞ; Þ >  in F D ¼ ðU; R [ DÞ
holds for any b 2 R  fag; ðNðaðx; yÞÞ; Þ   holds
, cj ¼ fag. The statement cj ¼ fag implies that a is the
unique attribute to maintain ðNðaðx; yÞÞ; Þ  . u
t

Theorem 4.4. Suppose P R, then P ðxÞ contains one attribute


value reduction of x if and only if P \ cj 6¼  for every cj 6¼ .

The proof is straightforward by Theorem 4.2 and


definition of cj .
Corollary 4.1. Suppose P R, then P ðxÞ is a attribute value
reduction of x if and only if P ðxÞ is the minimal subset of RðxÞ
satisfying P \ cj 6¼  for every cj 6¼ .

Theorem 4.3 shows that the value core is the collection of


single elements in the discernibility vector. Theorem 4.4 and
Corollary 4.1 show that by using discernibility vector, we
can find the attribute value reduction which is the minimal
subset having the nonempty intersection with each none-
mpty entry in discernibility vector. Algorithm 4.2 (Heuristic): to find one near-optimal
A discernibility function fx; ðF DÞ for an object x in F D is attribute value reduction of the object xi in F D ¼ ðU; R [ DÞ
a Boolean function of m Boolean variables a1 ; . . . ; am
corresponding to the attributes a1 ; . . . ; am , respectively,
and defined as follows: fx ðF DÞða1 ; . . . ; am Þ ¼ ^f_ðcj Þ:
1  j  ng, where _ðcj Þ is the disjunction of all variables a
such that a 2 cj . Let gx; ðF DÞ be the reduced disjunctive
form of fx; ðF DÞ obtained from fx; ðF DÞ by applying the
multiplication and absorption laws as many times as
possible. Then, there exist l and Rk  R for k ¼ 1; . . . ; l
such that gx; ðF DÞ ¼ ð^R1 Þ _    _ ð^Rl Þ, where every ele-
ment in Rk only appears one time.
Theorem 4.5. Re d ðxÞ ¼ fR1 ðxÞ; . . . ; Rl ðxÞg, here, Re d ðxÞ
is the collection of all attribute value reductions of x, and
Rk ðxÞ is the attribute value of x on attribute subset Rk for
k ¼ 1; . . . ; l.

The proof is omitted since this theorem is similar to the


one in [23].
Theorem 4.5 clearly shows that by using the discern- In the following, we use one example to illustrate the
ibility vector, we can find all value reductions. In the method of attribute value reduction proposed in this paper.

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 631

TABLE 1 TABLE 3
One Simple Decision Table The Discernibility Vector of Every Object
(All Vectors Become a Matrix)

TABLE 2
The Lower Approximation Value and the Consistence Degree

Example 4.1. Given a simple decision table with 10 objects TABLE 4


as follows (see Table 1), there are 10 condition fuzzy The Attribute Value Core of Every Object
attributes R ¼ fa; b; c; d; e; f; g; h; k; jg and one decision
symbolic attribute {D}. There are two decision classes: 0
and 1. The objects fx1 ; x6 ; x8 ; x9 g belong to class 0 and
fx2 ; x3 ; x4 ; x5 ; x7 ; x10 g belong to class 1.
TABLE 5
Attribute Value Reduction of Every Object
In this example, the Lukasiewicz T -conorm SL ðx; yÞ ¼
minf1; x þ yg and the negator NðxÞ ¼ 1  x are selected as the
T  conorm and negator operator to construct the S  lower
approximation operator. The lower approximation is then
specified as RS  AðxÞ ¼ inf AðuÞ minðð1  Rðx; uÞÞ þ AðuÞÞ;
Þ ^ inf AðuÞ> minðð1  Rðx; uÞÞ þ AðuÞÞ; 1Þ. The consistence
is computed by ConR; ðDÞðxÞ ¼ RS  ½xD ðxÞ. Let alpha ¼ 0:05,
the lower approximations of each decision class and the
consistence degree of each object are listed in Table 2. Table 2
shows that the consistence degree of each object gets its value
shows that a fuzzy decision table has been significantly
at the lower approximation of its corresponding decision
reduced after attribute value reduction.
class.
The formula to compute the discernibility vector of xi 4.2 Rule Induction from the Reduced Decision
is cji ¼ fa : maxðaðxi ; xj Þ  1 þ ; 0Þ  g for Dðxi ; xj Þ ¼ 0, Table
where  ¼ RS  ½xi D ðxi Þ; cji ¼ , for Dðxi ; xj Þ ¼ 1. The In a fuzzy decision table F D ¼ ðU; R [ DÞ, each original
discernibility vectors are listed in Table 3 (Each column object can be represented in the formulation of a decision rule.
in Table 3 corresponds to one discernibility vector). One
The corresponding rule of the object x is denoted by fdx . The
element of discernibility vector is an attribute subset in
restriction of fdx to R, denoted by fdx jR, is defined by
which each attribute can distinguish one pair of objects.
“NðRðx; yÞÞ < ConR; ðDÞðxÞ on RðxÞ ¼ faðxÞ : a 2 Rg” and
For example, the element of the first column and third
row in Table 3 is an attribute subset fa; cg. Each attribute the restriction of fdx to D is denoted by fdx jD ¼ ^fdðxÞ
d jd2Dg.

{a} or {c} can distinguish x1 from x3 without causing new They are called the condition and decision of fdx , respec-
inconsistence. tively. Thus, a fuzzy decision rule fdx can be denoted by
The attribute value core is the collection of the most fdx jR ! fdx jD.
important attribute values to distinguish one object from Example 4.2. Let us consider Example 4.1 again. We
others. The attribute value core of each object is listed in illustrate the rule representation of the object x1 in a
Table 4. The attributes in value core is the collection of the fuzzy decision table as follows:
element with single attribute in discernibility vector. For
example, the value core of object x3 is fða; 0Þðf; 0Þg in which Since the object x1 belongs to “decision class 0,” its
the attribute subset {a,f} is the collection of the elements lower approximation on “decision class 0” is 0.88889, and
with single attribute in discernibility vector of x3 . its consistence degree is 0.88889, the original decision rule
Using Algorithm 4.2, we compute one near-optimal corresponding to the object x1 can be stated in the
attribute value reduction of each object (see Table 5). Table 5 following formulation:

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
632 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

fdx1 jR ! fdx1 jD: if 1  Rðy; x1 Þ < 0:88889 on TABLE 6


One Reduced Table of Table 1
x1 ¼ fða; 1Þ; ðb; 0:1Þ; ðc; 0:11111Þ; ðd; 0:88889Þðe; 0:11111Þ;
ðf; 0:11111Þ; ðg; 1Þ; ðh; 0:11111Þ; ðk; 0Þ; ðj; 0:88889Þg;
then the object y belongs to “decision class 0.”
This rule represents that if the distance between y and x is
smaller than a certain value, then y belongs to the decision
class containing x. For example, since 1  Rðx6 ; x1 Þ ¼
0:66667 < 0:88889, the object x6 belongs to “decision class
0” by using the decision rule fdx1 jR ! fdx1 jD; since 1 
Rðx2 ; x1 Þ ¼ 1 > 0:88889, the decision class of object x2 cannot
be predicted by using the decision rule fdx1 jR ! fdx1 jD.
This example shows that the decision rule corresponding This theorem shows that the rule induction process keeps
the discernibility information in the original decision table
to each original object is very trivial. It is necessary to reduce
invariant. We also use this theorem to design an algorithm to
them from the viewpoint of attribute value reduction.
find a near-minimal rule set from F D ¼ ðU; R [ DÞ (see
Definition 4.4 (reduction rule). Given an object x and its Algorithm 4.3).
corresponding rule fdx jR ! fdx jD in F D ¼ ðU; R [ DÞ, Algorithm 4.3: to find the near-minimal rule set
suppose P ðxÞ RðxÞ is one attribute value reduction of x,
then fdx jP ! fdx jD is called the reduction rule of
fdx jR ! fdx jD.

This definition shows that each attribute value reduction


corresponds to a reduction rule.
Example 4.3. Let us consider Example 4.1 again. Suppose
one attribute value reduction of object x1 is
P ðx1 Þ ¼ fða; 1Þ; ðd; 0:88889Þg, then its corresponding re-
duction rule is represented as follows:
fdx1 jP ! fdx1 jD: if 1  P ðy; x1 Þ < 0:88889 on x1 ¼
Here, “All_rules” denotes the collection of the reduction
fða; 1Þ; ðd; 0:88889Þg (here, P ¼ fa; dg), then the object y
rules of every original decision rule; “Rule(xi )” denotes one
belongs to “decision class 0.”
reduction rule of xi ; “Cover_degree(xi )” denotes the number
of rules which is covered by the reduction rule “Rule(xi );” and
This rule represents that if the object y is similar with the “Minimal_rule_set” denotes the near-minimal rule set in-
reduced object x1 ¼ fða; 1Þ; ðd; 0:88889Þg, then y belongs to duced from the fuzzy decision table.
the decision class containing x, i.e., ½xD .
The covering power of each rule is measured using a Example 4.4. Let us consider the table in Example 4.1 again.
concept named rule covering. Using Algorithm 4.3, we obtain a near-minimal rule set
as follows:
Definition 4.5 (rule covering). Given an object x and its If 1  P ðy; xÞ < 0:77778 on
corresponding rule fdx jR ! fdx jD in F D ¼ ðU; R [ DÞ, for
an arbitrary object y 2 U, if NðRðx; yÞÞ < ConR; ðDÞðxÞ, x ¼ fða; 0:11111Þ; ðj; 0:77778Þg
then we say that fdx jR ! fdx jD covers the object y. We also
(here P ¼ fa; jg), then y belongs to “decision class 1.”
say that fdx jR ! fdx jD covers the rule corresponding to the
If 1  P ðy; xÞ < 0:77778 on x ¼ fðc; 1Þ; ðj; 0Þg (here,
object y.
P ¼ fc; jg), then y belongs to “decision class 0.”
If 1  P ðy; xÞÞ < 0:88889 on x ¼ fða; 1Þ; ðd; 0:88889Þg
If the rule fdx jR ! fdx jD covers the object y, then the
(here, P ¼ fa; dg), then y belongs to “decision class 0.”
object y belongs to the decision class of x, i.e., y 2 ½xD , if we
If 1  P ðy; xÞ < 1 on x ¼ fða; 0Þ; ðf; 0Þg (here, P ¼
ignore some misclassification and perturbation.
fa; fg), then y belongs to “decision class 1.”
Now, we present one theorem which shows the relation
This rule set can also be represented in a table (see
between the original decision rule and the reduction rule.
Table 6).
Theorem 4.6. Given an object x and its corresponding rule
fdx jP ! fdx jD in F D ¼ ðU; R [ DÞ, If the rule fdx jP ! The rule set obtained by using GFRS can be used as a
fdx jD covers one object y 2 U, then its reduction rule classifier to predict the unseen objects. In the following,
fdx jP ! fdx jD also covers the object y. the rule-based classifier obtained by using GFRS is
Proof. Since SðNðRðx; yÞÞ; 0Þ < ConR; ðDÞðxÞ and denoted by GFRSC.

ConR; ðDÞðxÞ ¼ ConP ; ðDÞðxÞ;


5 SCALABILITY ANALYSIS AND EXPERIMENTAL
we have SðNðP ðx; yÞÞ; 0Þ  ConP ; ðDÞðxÞ (by Rðx; yÞ  COMPARISON
P ðx; yÞ). Thus, the reduction rule fdx jP ! fdx jD covers In this section, we first analyze the scalability complexity of
the object y. u
t GFRSC, and then, experimentally compare it with several

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 633

TABLE 7
The Information of Some Data Sets from
UCI Machine Learning Repository

the following experiments, we adopt the S-lower approx-


imation operators constructed on TL to design the rule-based
classifier. The discernibility vector is then specified as
follows:

cji ¼ fa : aðxi ; xj Þ þ   1  g for Dðxi ; xj Þ ¼ 0;


ðV5:1Þ
Fig. 3. The procedure and scalability analysis to rule induction.
where  ¼ RðSL Þ ð½xi D Þðxi Þ,

rule-based classifiers. Since GFRS is the generalization of RS
and FRS, it is necessary to compare the rule-based classifiers cji ¼ ; for Dðxi ; xj Þ ¼ 1: ðV5:2Þ
constructed using RS and FRS (denoted RSC and FRSC,
The experiments are set up as follows:
respectively). We also compare GFRSC with one popular
Data set. Four data sets from UCI Machine Learning
rule-based classifier, decision tree (denoted as DTC).
Repository [49] are used to compare the performance of the
5.1 Scalability Analysis classifiers. The details are provided in Table 7.
Data set split. The data set is split into two parts. The
Fig. 3 shows the procedure for inducing rules using GFRSC
randomly chosen 50 percent of objects are used as the
and an analysis of its scalability. This figure is similar to the
training set to find the classifier. The remainders are used as
ones for RSC and FRSC. The main difference lies in the step
the testing set to test the accuracy of the found classifier.
covered by the dot-lined box, i.e., preprocessing. In the
Noise control. Some noise (misclassification and pertur-
fuzzy procedures, i.e., FRSC and GFRSC, preprocessing
bation) are added to the training sets since GFRSC focuses
means computing the similarity relation and lower approx-
on dealing with the problems of noise (i.e., misclassification
imation operator. Their time and space complexities of
and perturbation). The percentage of the added noise 

them are OðjU j2


jAjÞ and OðjU j2
jDjÞ, respectively. In the
100% means that in the randomly selected 
100% data,
crisp case, i.e., RSC, preprocessing means the discretization
the decision classes are changed and condition attributes
of data sets. Its time and space complexity depend on the
are perturbed.
discretization method that is used.
Indexes. Indexes are: 1) the number of selected rules and
5.2 Experimental Comparison 2) the classification accuracy of the rule set.
Parameter specification. We try the parameter in GFRSC
In this section, we experimentally compare GFRSC with
“alpha” from 0 to 0.15 with step 0.01. We try the percentage
some other rule-based classifiers: RSC, FRSC, and DTC. of noise “beta” from 0 to 0.2 with step 0.05.
First, we specify one reasonable triangular norm for GFRSC.
In GFRS (see Section 3), there are two lower approximation 5.2.1 Discussion of the Parameters: Alpha and Beta
operators (R#  and RS  ) and two upper approximation There are two parameters in the following experiments:
operators (RT and R ). Generally, they are not equal to the threshold in GFRSC “alpha” and the noise percentage
each other. If we specify the Lukasiewicz’s T -norm TL , then “beta.” In this section, we focus on analyzing the effect of
alpha and beta on the classification performance of
its dual conorm with respect to Ns is the bounded sum
GFRSC. Figs. 4 and 5 show the classification results on
SL ðx; yÞ ¼ minf1; x þ yg, the TL -residuated implication is four data sets. The x-axis is the value of alpha. The y-axis
#L ðx; yÞ ¼ minf1; 1  x þ yg, and the dual of #L with respect in Fig. 4 is the number of rules and the y-axis in Fig. 5 is
to the negator NðxÞ ¼ 1  x is L ðx; yÞ ¼ maxf0; y  xg; the classification accuracy. There are five curves in each
clearly, we have SL ð1  x; yÞ ¼ #L ðx; yÞ and L ð1  x; yÞ ¼ subfigure which correspond to the classification results on
TL ðx; yÞ, which imply R#L  ¼ RSL  and RðTL Þ ¼ RðL Þ hold different beta values. Note that the points on y-axis is the
for a fuzzy TL -similarity relation R and the standard negator accuracy (or the number of rules) of FRSC (Since GFRS
degenerates to the classifier on FRS when alpha is set to
NðxÞ ¼ 1  x. This is one reason why we specify the
0), which provide some base points to evaluate the
Lukasiewicz’s T -norm TL in GFRSC. Furthermore, many
performance of GFRSC.
discussions on the selection of triangular norm T emphasize Fig. 4 shows that the number of rules curves with the
the Lukasiewicz’s T -norm TL as a suitable selection [15], [53], change of alpha and beta. We find that the number of rules
[54]. For more on this, interested readers can refer to [24]. In monotonously decreases with the value of alpha. That is, the

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
634 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

Fig. 4. The valuation of number of rules of GFRSC on different values of Fig. 5. The valuation of number of rules of GFRSC on different values of
“alpha” and “beta.” (a) WDBC. (b) Diabetes. (c) Ionosphere. (d) Wine. “alpha” and “beta.” (a) WDBC. (b) Diabetes. (c) Ionosphere. (d) Wine.

bigger the parameter in GFRSC is, the compacter the obtained to keep the discernibility information in the data sets
rule set in GFRSC is. What is more, we find that on the y-axis, invariant before and after discretization (e.g., the
the number of rules changes dramatically with the percen- discretization method proposed by Nguyen which
tage of noise (i.e., beta), whereas it changes slightly when the preserves the discernibility information in the
value of alpha is large. This shows that from the viewpoint of
original data sets [57]), while the other is a common
number of rules, FRSC (i.e., alpha ¼ 0) is sensitive to the
discretization method (e.g., fuzzy C-mean). In this
percentage of noise, whereas GFRSC is less sensitive.
Fig. 5 shows that the accuracy of GFRSC curves with the paper, we select fuzzy C-mean as the discretization
change of alpha and beta. We find that in most cases, the method since some perturbation in the real-valued
accuracy curves of GFRSC share a similar pattern. At the data sets can be omitted by it. That is to say, RSC is
beginning, the accuracies increase dramatically, and then, less sensitive to perturbation in this paper.
change slightly. But when the value of alpha is too large, 2. FRS is one special case of GFRS. The main difference
the accuracy decreases perceptibly. This shows that within between them is the threshold introduced into the
a certain extent of alpha, the accuracy is stable and fuzzy approximation operators. By controlling this
reasonable. The extent of alpha depends on the domain of threshold, some misclassification and perturbation
specific practical problems. What is more, we also find that are ignored in the step of knowledge representation.
on y-axis, the accuracies change dramatically with the That is to say, GFRS is less sensitive to misclassifica-
percentage of noise (i.e., beta), whereas it changes slightly tion and perturbation.
when the value of alpha is large. These show that from the In this section, we experimentally compare
viewpoint of accuracy, FRSC is sensitive to the percentage GFRSC with RSC and FRSC. Here, we focus on
of noise; whereas GFRSC is less sensitive. comparing the sensitivity, the number of rules, and
The above analyses show that the noise in the data sets classification accuracy of the obtained classifier. The
significantly affects the performance of FRS, whereas it has comparison results are summarized in Figs. 6 and 7.
slight influence in GFRS. The basic reason is that the noise The x-axis in Figs. 6 and 7 is the noise percentage,
(i.e., misclassification and perturbation) is effectively con- i.e., beta. The y-axis in Fig. 6 is the number of rules
trolled in the step of knowledge representation in GFRS: and the y-axis in Fig. 7 is the classification accuracy.
misclassification or small perturbation is ignored by control-
In each subfigure, there are seven curves which
ling the threshold  in the fuzzy cut set ðR x Þ since the
correspond to the classification results on different
threshold  controls the accuracy of R x included in A.
classifiers, i.e., RSC, FRSC, and GFRSC on different
5.2.2 Experimental Comparison GFRSC with RSC and values of alpha.
FRSC Figs. 6 and 7 show that in most cases, the number
of rules and accuracy of RSC and FRSC increase
First of all, we would like to list the main differences among
significantly with the increment of noise, whereas
GFRS, RS, and FRS.
the number of rules and accuracy of GFRSC increase
1. FRS and GFRS can handle real-valued data sets, slightly. Except the data set “Diabetes,” other three
whereas RS can only deal with symbolic data sets. data sets share these patterns. This shows that FRSC
Therefore, RS need a preprocessing of discretization. and RSC are sensitive to noise, whereas GFRSC is
The selection of the discretization method affects less sensitive.
the performance of RSC. Generally, there are Fig. 6 shows that GFRSC finds a compacter rule
two types of discretization methods for RS. One is set than FRSC and RSC. It also shows that in most

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 635

TABLE 8
The Valuation of the Comparison of DTC and GFRSC

Fig. 6. The valuation of number of rules on GFRSC, RSC, and FRSC.


(a) WDBC. (b) Diabetes. (c) Ionosphere. (d) Wine.

cases, the curves obtained by GFRSC are lower than


the curves obtained by FRSC and RSC in most cases.
That is to say, from the viewpoint of number of
rules, the larger the percentage of noise is, the better
the performance of GFRSC over FRSC and RSC.
Fig. 7 shows that GFRSC finds a higher accuracy
than FRSC and RSC. We find that in most cases, the
accuracy curves of GFRSC are often higher than 5.2.3 Experimental Comparison GFRSC with
those in FRSC and RSC. The larger the percentage of Fuzzy Decision Tree
noise is, the greater the accuracy of GFRSC com- Decision tree is a very popular rule-based classifier
pared with FRSC and RSC. (denoted as DTC) [50], [51], [52]. It is necessary to compare
All the above analyses show that GFRSC per- our proposed rule-based classifier GFRSC with DTC. See5,
forms better than FRSC and RSC in terms of the as the successor of the successful and widely used ID3 and
noise sensitivity, number of rules, and accuracy. C4.5 systems (fuzzy decision tree learning algorithm) is
selected to compare with GFRSC since it can deal sensibly
with missing values and is pruning to deal with noisy data.
DTC and GFRS are two very different types of classifier
building systems: 1) DT obtains a set of rules by construct-
ing one tree; GFRS obtains a set of rules by reducing some
redundant attribute values. 2) DTC controls the noise by
pruning after the tree has been built; GFRSC controls the
noise before reducing the redundant attribute values, i.e., in
the first step of knowledge representation.
The classification results of GFRSC which have the similar
number of rules with DTC are selected to compare with DTC.
The comparison results are summarized in Table 8.
Table 8 shows that GFRSC performs better than DTC in
the noised data. The bold and italic texts in Table 8 are the
values of accuracy of GFRSC which are noticeably higher
than DTC. For example, GFRSC significantly performs better
than DTC on the noised data set “WDBC” and “Wine.” The
reason may be that DTC controls the noise in the last step,
whereas GFRSC controls the noise in the first step.
The computational complexity of DTC is OðjU j
jAj

jT jÞ (here, jT j is the size of the induced tree), whereas the


one of GFRSC is OðjU j2
jAjÞ. The strength of DTC is that it
Fig. 7. The valuation of accuracy on GFRSC, RSC, and FRSC. performs well on the data sets without noise and executes
(a) WDBC. (b) Diabetes. (c) Ionosphere. (d) Wine. faster than GFRSC. The strength of GFRSC is that it is a

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
636 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

robust model because it is less sensitive to misclassification involutive iff NðNðxÞÞ ¼ x for all x 2 ½0; 1; it is called
and perturbation. These two classifiers suit different types weakly involutive iff NðNðxÞÞ  x for all x 2 ½0; 1. The
of real applications. standard negator is defined as NS ðxÞ ¼ 1  x. Given a
negator N, T -norm T and T -conorm S are called dual w.r.t.
6 CONCLUSION AND FUTURE WORK N iff De Morgan laws are satisfied, i.e., SðNðxÞ; NðyÞÞ ¼
NðT ðx; yÞÞ and T ðNðxÞ; NðyÞÞ ¼ NðSðx; yÞÞ.
In this paper, we develop a rule-based classifier using a Given a lower semicontinuous triangular norm T , the
novel framework GFRS. We first propose a robust model of residuation implication, or called T -residuated implication
FRS (i.e., GFRS), and then, design a classifier building is a function # : ½0; 1
½0; 1 ! ½0; 1 that satisfies #ðx; yÞ ¼
method based on GFRS. The key idea of building the
supfzjz 2 ½0; 1; T ðx; zÞ  yg for every x; y 2 ½0; 1. Two ex-
classifier is to keep the discernibility information in the
amples of T -residuated implications include the Godel
original data sets invariant. The proposed GFRS model, as
implication #M , based on
the foundation of classifier building, has a sound mathe-
matical rational. We, by using the strict mathematical 
1; x  y
reasoning, not only show the reasonability of the new TM : #M ¼ ;
y; x > y
concept “consistence degree,” but also design a discern-
ibility vector. This is the main contribution of this paper. the Lukasiewicz implication #L , based on TL : #TL ¼
One advantage of this paper is that the classifier GFRSC is minf1  x þ y; 1g.
robust since GFRS is a robust with respect to misclassifica- Given a upper semicontinuous triangular conorm S, the
tion and perturbation. The main advantage of GFRS is that dual of T  residuated implication w.r.t. N is a function :
knowledge representation and knowledge discovery are ½0; 1
½0; 1 ! ½0; 1 that satisfies ðx; yÞ ¼ inffz j z 2 ½0; 1;
closely related. That is, the fuzzy lower approximation
Sðx; zÞ  yg for every x; y 2 ½0; 1. ðNðxÞ; NðyÞÞ ¼ Nð#ðx; yÞÞ
operator (i.e., consistence degree) is used in attribute value
and #ðNðxÞ; NðyÞÞ ¼ Nððx; yÞÞ hold for an involutive nega-
reduction. This point distinguishes GFRSC from other
classifiers built by using FRS. tor N. Examples of the dual of T -residuated implication w.r.t.
In the future, we would like to do some more work on N include the dual of Godel implication M , based on
GFRS. One of our future work is to give a more detailed 
0; x  y
discussion on the parameter in GFRS. Now, one guideline SM : M ¼ ;
y; x < y
of specifying this parameter is that in a certain reasonable
extent, the larger the parameter in GFRSC is, the compacter the dual of Lukasiewicz implication L , based on SL :
the obtained rule set is. However, it is difficult to specify the L ¼ maxfx þ y; 0g.
extent of its parameter.
A discussion on GFRS and probability theory is our
another future work. Our proposed GFRS model intends to ACKNOWLEDGMENTS
solve the problem about data perturbation. If data pertur- This work was supported in part by the Hong Kong RGC
bation is due to randomness, then another formal approach CERG research grants: PolyU 5273/05E (B-Q943) and PolyU
may be assumed that fuzzy data are obtained under a
5281/07E (B-Q06C). Chen Degang is supported by the NSFC
randomness process. Therefore, a framework based on the
notion of random fuzzy-rough set may be developed. (70871036). This work is also partially supported by the NSF
of Hebei Province (F2008000635, 08963522D, 06213548).

APPENDIX
Here, we present and exemplify some notions of fuzzy
REFERENCES
logical operators [4], [10], [11], [55] that are triangular norm, [1] Z. Pawlak and A. Skowron, “Rough Sets: Some Extensions,”
Information Sciences, vol. 177, pp. 28-40, 2007.
triangular conorm, negator, dual, T -residuated implication, [2] L.A. Zadeh, “Fuzzy Sets,” Information and Control, vol. 8, pp. 338-
and its dual operation. 353, 1965.
A triangular norm, or shortly T -norm, is a function T : [3] D. Dubois and H. Prade, “Rough Fuzzy Sets and Fuzzy Rough
½0; 1
½0; 1 ! ½0; 1 that satisfies the following conditions: Sets,” Int’l J. General Systems, vol. 17, pp. 191-208, 1990.
[4] D. Dubois and H. Prade, “Putting Rough Sets and Fuzzy Sets
monotonicity (T ðx; yÞ  T ð; Þ); commutativity: (T ðx; yÞ ¼ Together,” Intelligent Decision Support, Handbook of Applications and
T ðy; xÞ); associativity: (T ðT ðx; yÞ; zÞ ¼ T ðx; T ðy; zÞÞ); and Advances of the Rough Sets Theory, R. Slowinski, ed., pp. 203-232,
boundary condition: (T ðx; 1Þ ¼ x). The most popular contin- Kluwer Academic Publishers, 1992.
uous T -norms include the standard min operator (the largest [5] J.S. Mi and W.X. Zhang, “An Axiomatic Characterization of a
Fuzzy Generalization of Rough Sets,” Information Sciences, vol. 160,
T -norm) TM ðx; yÞ ¼ minfx; yg; the bounded intersection (also pp. 235-249, 2004.
called the Lukasiewicz T -norm) TL ðx; yÞ ¼ maxf0; x þ y1g. [6] W.Z. Wu, J.S. Mi, and W.X. Zhang, “Generalized Fuzzy Rough
A triangular conorm, or shortly T -conorm, is an Sets,” Information Sciences, vol. 151, pp. 263-282, 2003.
increasing, commutative, and associative function S : [7] W.Z. Wu and W.X. Zhang, “Constructive and Axiomatic
Approaches of Fuzzy Approximation Operators,” Information
½0; 1
½0; 1 ! ½0; 1 that satisfies the boundary condition: Sciences, vol. 159, pp. 233-254, 2004.
8x 2 ½0; 1; Sðx; 0Þ ¼ x. The well-known continuous T - [8] D.G. Chen, X.Z. Wang, D.S. Yeung, and E.C.C. Tsang, “Rough
conorms include the standard max operator (the smallest Approximations on a Complete Completely Distributive Lattice
T -conorm) SM ðx; yÞ ¼ maxfx; yg; the bounded sum with Applications to Generalized Rough Sets,” Information
Sciences, vol. 176, pp. 1829-1848, 2006.
SL ðx; yÞ ¼ minf1; x þ yg. [9] D.S. Yeung, D.G. Chen, E.C.C. Tsang, J.W.T. Lee, and X.Z. Wang,
A negator N is a decreasing function N : ½0; 1 ! ½0; 1 “On the Generalization of Fuzzy Rough Sets,” IEEE Trans. Fuzzy
that satisfies Nð0Þ ¼ 1 and Nð1Þ ¼ 0. A negator N is called Systems, vol. 13, no. 3, pp. 343-361, June 2005.

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 637

[10] N.N. Morsi and M.M Yakout, “Axiomatics for Fuzzy Rough Sets,” [34] R.B. Bhatt and M. Gopal, “On Fuzzy Rough Sets Approach to
Fuzzy Sets and Systems, vol. 100, pp. 327-342, 1998. Feature Selection,” Pattern Recognition Letters, vol. 26, pp. 1632-
[11] A.M. Radzikowska and E.E. Kerre, “A Comparative Study of 1640, 2005.
Fuzzy Rough Sets,” Fuzzy Sets and Systems, vol. 126, pp. 137-155, [35] Q. Shen and A. Chouchoulas, “A Rough-Fuzzy Approach for
2002. Generating Classification Rules Pattern Recognition,” Pattern
[12] M.D. Cock, C. Cornelis, and E.E. Kerre, “Fuzzy Rough Sets: The Recognition, vol. 35, no. 11, pp. 2425-2438, 2002.
Forgotten Step,” IEEE Trans. Fuzzy Systems, vol. 15, no. 1, pp. 121- [36] S.K. Pal and P. Mitra, “Case Generation Using Rough Sets with
130, Feb. 2007. Fuzzy Representation,” IEEE Trans. Knowledge and Data Eng.,
[13] C. Cornelis, M.D. Cock, and A.M. Radzikowska, “Fuzzy Rough vol. 16, no. 3, pp. 292-300, Mar. 2004.
Sets: From Theory into Practice,” Handbook of Granular Computing, [37] Y. Li, S.C.K. Shiu, and S.K. Pal, “Combining Feature Reduction
W. Pedrycz, A. Skowron, and V. Kreinovich, eds., Springer- and Case Selection in Building CBR Classifiers,” IEEE Trans.
Verlag, 2008. Knowledge and Data Eng., vol. 18, no. 2, pp. 415-429, Mar. 2006.
[38] Q.H. Hu, D.R. Yu, and Z.X. Xie, “Neighborhood Classifiers,”
[14] R. Slowinski and D. Vanderpooten, “A Generalized Definition of
Expert Systems with Applications, vol. 34, pp. 866-876, 2008.
Rough Approximations Based on Similarity,” IEEE Trans. Knowl-
[39] G.Q. Cao, S.C.K. Shiu, and X.Z. Wang, “A Fuzzy-Rough Approach
edge and Data Eng., vol. 12, no. 2, pp. 331-336, Mar./Apr. 2000.
for the Maintenance of Distributed Case-Based-Reasoning Sys-
[15] J.M.F. Salido and S. Murakami, “Rough Set Analysis of a General tems,” Soft Computing, vol. 7, no. 8, pp. 491-499, 2003.
Type of Fuzzy Data Using Transitive Aggregations of Fuzzy [40] M. Sarkar, “Fuzzy-Rough Nearest Neighbor Algorithms in
Similarity Relations,” Fuzzy Sets and Systems, vol. 139, pp. 635-660, Classification,” Fuzzy Sets and Systems, vol. 158, no. 19, pp. 2134-
2003. 2152, 2007.
[16] Y.Y. Yao, “Combination of Rough and Fuzzy Sets Based on Alpha- [41] R.B. Bhatt and M. Gopal, “FRCT: Fuzzy-Rough Classification
Level Sets,” Rough Sets and Data Mining: Analysis for Imprecise Data, Trees,” Pattern Analysis and Applications, vol. 11, no. 1, pp. 73-88,
T.Y. LIN and N. Cercone, eds., pp. 301-321, Kluwer Academic 2008.
Publishers, 1997. [42] R. Yasdi, “Learning Classification Rules from Database in the
[17] Q. Hu, Y. Daren, and X. Zongxia, “Fuzzy Probabilistic Approx- Context of Knowledge Acquisition and Representation,” IEEE
imation Spaces and Their Information Measures,” IEEE Trans. Trans. Knowledge and Data Eng., vol. 3, no. 3, pp. 293-306,
Fuzzy Systems, vol. 14, no. 2, pp. 191-201, Apr. 2006. Sept. 1991.
[18] R. Jensen and Q. Shen, “Fuzzy-Rough Attributes Reduction [43] Y.C. Tsai, C.H. Cheng, and J.R. Chang, “Entropy-Based Fuzzy
with Application to Web Categorization,” Fuzzy Sets and Rough Classification Approach for Extracting Classification
Systems, vol. 141, pp. 469-485, 2004. Rules,” Expert Systems with Applications, vol. 31, no. 2, pp. 436-
[19] Q. Shen and R. Jensen, “Selecting Informative Features with 443, 2006.
Fuzzy-Rough Sets and Its Application for Complex Systems [44] E.C.C. Tsang and S.Y. Zhao, “Decision Table Reduction in KDD:
Monitoring,” Pattern Recognition, vol. 37, pp. 1351-1363, 2004. Fuzzy Rough Based Approach, Transaction on Rough Sets,”
[20] R. Jensen and Q. Shen, “Semantics-Preserving Dimensionality Lecture Notes in Computer Sciences, vol. 5946, pp. 177-188, 2010.
Reduction: Rough and Fuzzy-Rough-Based Approaches,” IEEE [45] X.Z. Wang, E.C.C. Tsang, S.Y. Zhao, D.G. Chen, and D.S. Yeung,
Trans. Knowledge and Data Eng., vol. 16, no. 12, pp. 1457-1471, Dec. “Learning Fuzzy Rules from Fuzzy Samples Based on Rough Set
2004. Technique,” Information Sciences, vol. 177, no. 20, pp. 4493-4514,
[21] R. Jensen and Q. Shen, “Fuzzy-Rough Sets Assisted Attribute 2007.
Selection,” IEEE Trans. Fuzzy Systems, vol. 15, no. 1, pp. 73-89, [46] S.Y. Zhao, E.C.C. Tsang, and D.G. Chen, “The Model of Fuzzy
Feb. 2007. Variable Precision Rough Sets,” IEEE Trans. Fuzzy Systems, vol. 17,
no. 2, pp. 451-467, Apr. 2009.
[22] R. Jensen and Q. Shen, “New Approaches to Fuzzy-Rough Feature
[47] A. Skowron and C. Rauszer, “The Discernibility Matrices and
Selection,” IEEE Trans. Fuzzy Systems, vol. 17, no. 4, pp. 824-838,
Functions in Information Systems,” Intelligent Decision Sup-
Aug. 2009.
port—Handbook of Applications and Advances of the Rough Sets
[23] E.C.C. Tsang, D.G. Chen, D.S. Yeung, X.Z. Wang, and J.W.T. Lee, Theory, R. Slowinski, ed., pp. 331-362, Kluwer Academic
“Attribute Reduction Using Fuzzy Rough Sets,” IEEE Trans. Fuzzy Publishers, 1992.
Systems, vol. 16, no. 5, pp. 1130-1141, 2008. [48] Y.Y. Yao, Y. Zhao, and J. Wang, “On Reduct Construction
[24] D.G. Chen, E.C.C. Tsang, and S.Y. Zhao, “Attributes Reduction Algorithms,” Proc. First Int’l Conf. Rough Sets and Knowledge
with Fuzzy Rough Sets,” Proc. 2007 IEEE Int’l Conf. Systems, Man, Technology (RSKT ’06), pp. 297-304, July 2006.
and Cybernetics, vol. 1, pp. 486-491, 2007. [49] http://www.ics.uci.edu/ mlearn/MLRepository.html, 2008.
[25] Q.H. Hu, D.R. Yu, and Z.X. Xie, “Information-Preserving Hybrid [50] S.R. Safavian and D. Landgrebe, “A Survey of Decision Tree
Data Reduction Based on Fuzzy-Rough Techniques,” Pattern Classifier Methodology,” IEEE Trans. Systems, Man Cybernetics,
Recognition Letters, vol. 27, pp. 414-423, 2006. vol. 21, no. 3, pp. 660-674, May/June 1991.
[26] Q.H. Hu, D.R. Yu, Z.X. Xie, and X.D. Li, “EROS: Ensemble Rough [51] Y.F. Yuan and M.J. Shaw, “Introduction of Fuzzy Decision Tree,”
Subspaces,” Pattern Recognition, vol. 40, pp. 3728-3739, 2007. Fuzzy Sets and Systems, vol. 69, pp. 125-139, 1995.
[27] Q.H. Hu, Z.X. Xie, and D.R. Yu, “Hybrid Attribute Reduction [52] J.R. Quinlan, See5, Release 1.13, http://www.rulequest.com/,
Based on a Novel Fuzzy-Rough Model and Information Granula- 2009.
tion,” Pattern Recognition, vol. 40, pp. 3509-3521, 2007. [53] J.C. Bezdek and J.O. Harris, “Fuzzy Partitions and Relations: An
[28] D. Slezak, “Approximate Reducts in Decision Tables,” Proc. Sixth Axiomatic Basis of Clustering,” Fuzzy Sets and Systems, vol. 84,
Int’l Conf. Information Processing and Management of Uncertainty in pp. 143-153, 1996.
Knowledge-Based Systems, pp. 1159-1164, 1996. [54] T. Sudkamp, “Similarity, Interpolation, and Fuzzy Rule Construc-
[29] D. Slezak, “Approximate Entropy Reducts,” Fundamenta Informa- tion,” Fuzzy Sets and Systems, vol. 58, pp. 73-86, 1993.
ticae, vol. 53, pp. 365-390, 2002. [55] G.J. Klir and B. Yuan, Fuzzy Logic: Theory and Applications.
Prentice-Hall, 1995.
[30] H.S. Nguyen and D. Slezak, “Approximate Reducts and
[56] W. Zhu and F.Y. Wang, “On Three Types of Covering-Based
Association Rules—Corresponding and Complexity Results,”
Rough Sets,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 8,
Proc. Int’l Workshop New Directions in Rough Sets, Data Mining,
pp. 1131-1144, Aug. 2007.
and Granular-Soft Computing (RSFDGrC ’99), N. Zhong,
[57] S.H. Nguyen and H.S. Nguyen, “Some Effective Algorithms for
A. Skowron, and S. Ohsuga, eds., pp. 137-145, 1999.
Rough Set Methods,” Proc. Conf. Int’l Processing and Management of
[31] C. Cornelis, G.H. Martı́n, R. Jensen, and D. Slezak, “Feature Uncertainty in Knowledge Based Systems, pp. 1451-1456, 1996.
Selection with Fuzzy Decision Reducts,” Proc. Third Int’l Conf.
Rough Sets and Knowledge Technology (RSKT ’08), 2008.
[32] E.C.C. Tsang, S.Y. Zhao, and J.W.T. Lee, “Rule Induction Based on
Fuzzy Rough Sets,” Proc. 2007 Int’l Conf. Machine Learning and
Cybernetics, vol. 5, pp. 3028-3033, Aug. 2007.
[33] C. Cornelis and R. Jensen, “A Noise-Tolerant Approach to Fuzzy-
Rough Feature Selection,” Proc. 17th Int’l Conf. Fuzzy System
(FUZZ-IEEE ’08), 2008.

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
638 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010

Suyun Zhao (M’06) received the bachelor’s and XiZhao Wang received the PhD degree in
master’s degrees from the School of Mathe- computer science from Harbin Institute of
matics and Computer Science, Hebei University, Technology, China, in 1998. He is presently
Baoding, China, in 2001 and 2004, respectively. the dean and a professor of the College of
She is currently working toward the PhD degree Mathematics and Computer Science, Hebei
in the Department of Computing, The Hong University, China. From September 1998 to
Kong Polytechnic University. Her research inter- September 2001, he served as a research fellow
ests are in the areas of machine learning, in the Department of Computing, The Hong
pattern recognition, fuzzy sets, and rough sets. Kong Polytechnic University. He became a full
She is a student member of the IEEE. professor and the dean of the College of
Mathematics and Computer Science at Hebei University in October
2001. His main research interests include learning from examples with
Eric C.C. Tsang received the BSc degree in fuzzy representation, fuzzy measures and integrals, neurofuzzy systems
computer studies from the City University of and genetic algorithms, feature extraction, multiclassifier fusion, and
Hong Kong, Kowloon, in 1990, and the PhD applications of machine learning. He has more than 160 publications
degree in computing from The Hong Kong including four books, seven book chapters, and more than 90 journal
Polytechnic University, in 1996. He is an papers in the IEEE Transactions on Pattern Analysis and Machine
assistant professor in the Department of Com- Intelligence, the IEEE Transactions on Systems, Man, and Cybernetics,
puting, The Hong Kong Polytechnic University. the IEEE Transactions on Fuzzy Systems, Fuzzy Sets and Systems,
His main research interests are in the areas of Pattern Recognition, etc. He has been the PI/Co-PI for 16 research
fuzzy expert systems, fuzzy neural networks, projects supported partially by the National Natural Science Foundation
machine learning, genetic algorithms, and fuzzy of China and the Research Grant Committee of Hong Kong Govern-
support vector machine. ment. He is an IEEE senior member (Board of Governor member in
2005, 2007-2009), the chair of the IEEE SMC Technical Committee on
Degang Chen received the MS degree from the Computational Intelligence, an associate editor of the IEEE Transac-
Northeast Normal University, Changchun, Jilin, tions on SMC, Part B, an associate editor of the Pattern Recognition and
China, in 1994, and the PhD degree from Harbin Artificial Intelligence, a member of the editorial board of Information
Institute of Technology, China, in 2000. He has Sciences, and an executive member of the Chinese Association of
worked as a postdoctoral fellow with Xi’an Artificial Intelligence. He was the recipient of the IEEE-SMCS Out-
Jiaotong University, China, from 2000 to 2002 standing Contribution Award in 2004 and the IEEE-SMCS Best
and with Tsinghua University, China, from 2002 Associate Editor Award in 2006. He is the general cochair of the
to 2004. Since 2006, he has been working as a 2002-2009 International Conferences on Machine Learning and
professor at North China Electric Power Uni- Cybernetics, cosponsored by the IEEE SMCS. He is a distinguished
versity, Beijing, China. His research interests lecturer of the IEEE SMC Society.
include fuzzy group, fuzzy analysis, rough sets, and SVM.

. For more information on this or any other computing topic,


please visit our Digital Library at www.computer.org/publications/dlib.

Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.

You might also like