Professional Documents
Culture Documents
5, MAY 2010
Building a Rule-Based
Classifier—A Fuzzy-Rough Set Approach
Suyun Zhao, Student Member, IEEE, Eric C.C. Tsang, Degang Chen, and
XiZhao Wang, Senior Member, IEEE
Abstract—The fuzzy-rough set (FRS) methodology, as a useful tool to handle discernibility and fuzziness, has been widely studied.
Some researchers studied on the rough approximation of fuzzy sets, while some others focused on studying one application of FRS:
attribute reduction (i.e., feature selection). However, constructing classifier by using FRS, as another application of FRS, has been less
studied. In this paper, we build a rule-based classifier by using one generalized FRS model after proposing a new concept named as
“consistence degree” which is used as the critical value to keep the discernibility information invariant in the processing of rule
induction. First, we generalized the existing FRS to a robust model with respect to misclassification and perturbation by incorporating
one controlled threshold into knowledge representation of FRS. Second, we propose a concept named as “consistence degree” and by
the strict mathematical reasoning, we show that this concept is reasonable as a critical value to reduce redundant attribute values in
database. By employing this concept, we then design a discernibility vector to develop the algorithms of rule induction. The induced
rule set can function as a classifier. Finally, the experimental results show that the proposed rule-based classifier is feasible and
effective on noisy data.
1 INTRODUCTION
(denoted by GFRS), which is used as the foundation to build a 2.1 Fuzzy Decision Table
rule-based classifier. Second, we propose the consistence Let U be a nonempty set with finite objects. With every
degree as a critical point to reduce attribute values and the object, we associate a set of condition attributes R and a set
strict mathematical reasoning shows that this concept is of decision attributes D. The pair ðU; R [ DÞ is called a
reasonable by keeping it invariant to reduce attribute values. decision table, denoted by DS. If some attributes in DS are
Further, we propose a discernibility vector to find all fuzzy attributes, the decision table is called fuzzy decision
attribute value reductions and investigate the structure of table, denoted by F D ¼ ðU; R [ DÞ. Here, fuzzy attributes
attribute value reductions. The above proposed concepts, i.e., mean the attributes with real number values since these
consistence degree and discernibility vector, are the main values can be transferred to fuzzy values. For simplicity, we
contribution of this paper. Forth, we develop some heuristic still use R to represent the set of fuzzy attributes.
algorithms, which achieve the near-optimal attribute value In a fuzzy decision table F D ¼ ðU; R [ DÞ, aðxÞ repre-
reductions and near-minimal rule set, to build a rule-based sents the value of x 2 U on the attribute a and P ðxÞ ¼
classifier. Finally, we compare our proposed method with faðxÞ : a 2 P Rg represents the subset of attribute values
of x 2 U on attribute subset P R. With every P R, we
other rule-based classifiers: RS, FRS, and fuzzy decision tree.
associate a binary relation P ðx; yÞ called fuzzy similarity
The experimental results show the feasibility and effective-
relation of P , which is a binary relation satisfying reflexivity
ness of our proposed method on noisy data. (P ðx; xÞ ¼ 1), symmetry (P ðx; yÞ ¼ P ðy; xÞ), and T tran-
The rest of this paper is organized as follows: Section 2
sitivity (P ðx; yÞ T ðP ðx; zÞ; P ðz; yÞÞ) for every x; y; z 2 U.
reviews FRS. Section 3 generalizes fuzzy approximation Actually, when the attribute values are symbolic, the fuzzy
operators to a robust model named as GFRS. Section 4 then similarity relation degenerates to an equivalence relation,
proposes the concepts of attribute value reduction based on which generates a partition on U, denoted by U=P ¼
GFRS and designs a discernibility vector to investigate the f½xP j x 2 Ug, where ½xP ¼ fy 2 U j aðxÞ ¼ aðyÞ; 8a 2 P g is
structure of attribute value reductions. Also, it proposes a the equivalence class containing x 2 U.
rule induction algorithm. In Section 5, the experimental In most practical applications, condition attributes often
comparisons between our proposed method and other rule- have real number values, while decision attributes have
based classifiers are given. Section 6 concludes this paper. symbolic values. This type of fuzzy decision table with
condition fuzzy attributes and decision symbolic attributes
is thus acted as the platform to design the rule-based
2 REVIEW OF FUZZY-ROUGH SETS classifier by using FRS in this paper.
Theories of RSs and fuzzy sets (FSs) are distinct and
complementary to handle vagueness and uncertainty [4], 2.2 Fuzzy-Rough Sets
[16]. RS described the idea of indiscernibility between The concept of FRS was first proposed by Dubois and Prade
objects in a set, while FS modeled the ill-definition of the [3], [4], their idea was as follows: Let U be a nonempty
boundary of a subclass of this set. That is to say, they were universe and R a fuzzy binary relation on U, F ðUÞ the fuzzy
not rival theories but captured two distinct aspects of power set of U. A fuzzy-rough set is a pair ðR ðF Þ; R ðF ÞÞ of
imperfection in knowledge [4]. Many researchers were then a fuzzy set F on U such that for every x 2 U,
inspired to combine them to handle indiscernibility and
R ðF ÞðxÞ ¼ infy2U maxf1 Rðx; yÞ; F ðyÞg;
fuzziness [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12],
[13], [14], [15], [16], [17]. The results of these studies led to R ðF ÞðxÞ ¼ supy2U minfRðx; yÞ; F ðyÞg:
the instruction of the notions of FRS.
In [10], the above Dubois and Prade fuzzy-rough set was
Generally speaking, FRS was composed of two parts.
generalized from Max; Min to a residuated implicator #
One was knowledge representation (i.e., the rough approx-
and a general triangular norm T with respect to a fuzzy
imation of fuzzy sets), while the other was knowledge
similarity relation R. The lower and upper approximation
discovery (i.e., attribute reduction and rule induction). Most
operators of a fuzzy set A are defined as for every A 2 F ðUÞ,
researchers focused on knowledge representation of FRS by
using two approaches: the constructive and axiomatic [3], RAðxÞ ¼ infu2U #ðRðu; xÞ; AðuÞÞ;
[4], [5], [6], [7], [8], [9], [10], [11], [14]. The axiomatic
RAðxÞ ¼ supu2U T ðRðu; xÞ; AðuÞÞ:
approach took the fuzzy approximation operators as the
primary notion and focused on studying the mathematical However, another upper approximation operator was
structure of FRS, such as algebraic and topologic structures. proposed to obtain the dual approximation operator of RA
On the contrary, the constructive approach took the since RA and RA were not dual with each other [5].
replacements of the equivalence relation, such as fuzzy Suppose S is the dual triangular conorm of T , define
binary relation, fuzzy T-similarity relation, and fuzzy weak RðAÞðxÞ ¼ supy2U ð1 Rðx; yÞ; AðyÞÞ as another kind of
equivalence partition, as the primary notion. Other notions, upper approximation of a fuzzy set A.
such as the lower and upper approximations, were The aforementioned approximation operators can be
constructed by using the replacements. summarized as the following four general fuzzy approx-
The summarization of FRS has been done in [1], [9], [13], imation operators. For every A 2 F ðUÞ,
[23] and interested readers can refer to them. In this paper,
we just review some researches of FRS summarized in [9], . T —upper approximation operator:
[23], which are helpful for us to build a rule-based classifier.
As preliminaries, a fuzzy decision table used as the study RT AðxÞ ¼ supu2U T ðRðx; uÞ; AðuÞÞ;
platform of FRS is formulated as follows:
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
626 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 627
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
628 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 629
In this paper, we enlarge the critical value to RS ½xD ðxÞ Proof. If ½zD ¼ ½xD , it is straightforward to get
since it is the enlarged value of RS ½xD ðxÞ in the proposed ðR x Þ ½xD . If ½zD 6¼ ½xD , we have R x ðxÞ for
robust framework GFRS, we then obtain the following x 62 ½zD , i.e., . When , we get ððR x Þ ¼
statement “if the distance between the objects x and y are Þ ½xD . u
t
smaller than the value RS ½xD ðxÞ, then these two objects Lemma 4.2. For two T -similarity relations R and P , if
are consistent if we ignore some inconsistent objects caused 8x; u 2 U; Rðx; uÞ P ðx; uÞ, then the following statements
by misclassification or perturbation. That is to say, if hold: (L 4.2.1) R# A P# A; (L 4.2.2) RT A PT A.
NðRðx; yÞÞ < RS ½xD ðxÞ, then ½xD ðyÞ ¼ 1; if NðRðx; yÞÞ
Proof.
RS ½xD ðxÞ, then ½xD ðyÞ ¼ 0 may happen. It is clear that
RS ½xD ðxÞ is the critical value to keep the consistence of the 8x; u 2 U; Rðx; uÞ P ðx; uÞ ) inf #ðRðx; uÞ; Þ
object x with other objects if we ignore some misclassifica- AðuÞ
tion and perturbation. Thus, we define the consistence ^ inf #ðRðx; uÞ; AðuÞÞ inf #ðP ðx; uÞ; Þ ðL4:2:1Þ
degree of an object in a fuzzy decision table as follows: AðuÞ> AðuÞ
Definition 4.1 (consistence degree). Given an object x in ^ inf #ðP ðx; uÞ; AðuÞÞ
AðuÞ>
F D ¼ ðU; R [ DÞ, let
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
630 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
The proposed discernibility vector is similar with one following, we present the detail algorithm to find all value
column of discernibility matrix in [47]. cj is the collection of reductions by using the discernibility vector.
attribute maintaining ðNðaðxi ; xj ÞÞ; Þ , each attribute We now design an algorithm to compute all attribute
in which can distinguish xi from xj . value reductions for one object by using the discernibility
The following theorems show that by using discernibility vector approach. For xi 2 U in F D ¼ ðU; R [ DÞ, the algo-
vector, we can find the value reductions and core. rithm to find all attribute value reductions of xi is described
Theorem 4.3. Core ðxÞ ¼ faðxÞ : cj ¼ fagg for some 1 j n. in Algorithm 4.1. This algorithm can find all attribute value
Proof. reductions of one object, but its computational complexity
increases exponentially with the number of attributes (i.e.,
aðxÞ 2 Core ðxÞ , ConR; ðDÞðxÞ > ConfRfagg; ðDÞðxÞ OðjU j
2jAj Þ, here, jU j is the number of objects and jAj is the
, RS ð½xD ÞðxÞ > ðR fagÞS ð½xD ÞðxÞ , : number of attributes). In practical applications, it is not
necessary to find all value reductions because finding a near-
optimal one is enough. We then provide a heuristic
Let ¼ RS ð½xD ÞðxÞ and t ¼ ðR fagÞS ð½xD ÞðxÞ, then algorithm (see Algorithm 4.2) to find one near-optimal
these four formulas
attribute value reduction by rewriting the part covered by
> t; R ðxÞ ½xD ; ðR fagÞ ðxÞt ½xD ; dot-lined box in Algorithm 4.1. The key idea of this algorithm
uses the add-deletion strategy of reduction algorithm
and ðR fagÞ ðxÞ 6 ½xD hold (By Lemma 4.1) , there construction described in [48].
exists y 2 U(y 6¼ x) satisfying ðNððR fagÞðx; yÞÞ; Þ > Algorithm 4.1: to find all attribute value reductions of xi
and ðNððR fagÞðx; yÞÞ; tÞ , ðNðbðx; yÞÞ; Þ > in F D ¼ ðU; R [ DÞ
holds for any b 2 R fag; ðNðaðx; yÞÞ; Þ holds
, cj ¼ fag. The statement cj ¼ fag implies that a is the
unique attribute to maintain ðNðaðx; yÞÞ; Þ . u
t
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 631
TABLE 1 TABLE 3
One Simple Decision Table The Discernibility Vector of Every Object
(All Vectors Become a Matrix)
TABLE 2
The Lower Approximation Value and the Consistence Degree
{a} or {c} can distinguish x1 from x3 without causing new They are called the condition and decision of fdx , respec-
inconsistence. tively. Thus, a fuzzy decision rule fdx can be denoted by
The attribute value core is the collection of the most fdx jR ! fdx jD.
important attribute values to distinguish one object from Example 4.2. Let us consider Example 4.1 again. We
others. The attribute value core of each object is listed in illustrate the rule representation of the object x1 in a
Table 4. The attributes in value core is the collection of the fuzzy decision table as follows:
element with single attribute in discernibility vector. For
example, the value core of object x3 is fða; 0Þðf; 0Þg in which Since the object x1 belongs to “decision class 0,” its
the attribute subset {a,f} is the collection of the elements lower approximation on “decision class 0” is 0.88889, and
with single attribute in discernibility vector of x3 . its consistence degree is 0.88889, the original decision rule
Using Algorithm 4.2, we compute one near-optimal corresponding to the object x1 can be stated in the
attribute value reduction of each object (see Table 5). Table 5 following formulation:
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
632 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 633
TABLE 7
The Information of Some Data Sets from
UCI Machine Learning Repository
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
634 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
Fig. 4. The valuation of number of rules of GFRSC on different values of Fig. 5. The valuation of number of rules of GFRSC on different values of
“alpha” and “beta.” (a) WDBC. (b) Diabetes. (c) Ionosphere. (d) Wine. “alpha” and “beta.” (a) WDBC. (b) Diabetes. (c) Ionosphere. (d) Wine.
bigger the parameter in GFRSC is, the compacter the obtained to keep the discernibility information in the data sets
rule set in GFRSC is. What is more, we find that on the y-axis, invariant before and after discretization (e.g., the
the number of rules changes dramatically with the percen- discretization method proposed by Nguyen which
tage of noise (i.e., beta), whereas it changes slightly when the preserves the discernibility information in the
value of alpha is large. This shows that from the viewpoint of
original data sets [57]), while the other is a common
number of rules, FRSC (i.e., alpha ¼ 0) is sensitive to the
discretization method (e.g., fuzzy C-mean). In this
percentage of noise, whereas GFRSC is less sensitive.
Fig. 5 shows that the accuracy of GFRSC curves with the paper, we select fuzzy C-mean as the discretization
change of alpha and beta. We find that in most cases, the method since some perturbation in the real-valued
accuracy curves of GFRSC share a similar pattern. At the data sets can be omitted by it. That is to say, RSC is
beginning, the accuracies increase dramatically, and then, less sensitive to perturbation in this paper.
change slightly. But when the value of alpha is too large, 2. FRS is one special case of GFRS. The main difference
the accuracy decreases perceptibly. This shows that within between them is the threshold introduced into the
a certain extent of alpha, the accuracy is stable and fuzzy approximation operators. By controlling this
reasonable. The extent of alpha depends on the domain of threshold, some misclassification and perturbation
specific practical problems. What is more, we also find that are ignored in the step of knowledge representation.
on y-axis, the accuracies change dramatically with the That is to say, GFRS is less sensitive to misclassifica-
percentage of noise (i.e., beta), whereas it changes slightly tion and perturbation.
when the value of alpha is large. These show that from the In this section, we experimentally compare
viewpoint of accuracy, FRSC is sensitive to the percentage GFRSC with RSC and FRSC. Here, we focus on
of noise; whereas GFRSC is less sensitive. comparing the sensitivity, the number of rules, and
The above analyses show that the noise in the data sets classification accuracy of the obtained classifier. The
significantly affects the performance of FRS, whereas it has comparison results are summarized in Figs. 6 and 7.
slight influence in GFRS. The basic reason is that the noise The x-axis in Figs. 6 and 7 is the noise percentage,
(i.e., misclassification and perturbation) is effectively con- i.e., beta. The y-axis in Fig. 6 is the number of rules
trolled in the step of knowledge representation in GFRS: and the y-axis in Fig. 7 is the classification accuracy.
misclassification or small perturbation is ignored by control-
In each subfigure, there are seven curves which
ling the threshold in the fuzzy cut set ðR x Þ since the
correspond to the classification results on different
threshold controls the accuracy of R x included in A.
classifiers, i.e., RSC, FRSC, and GFRSC on different
5.2.2 Experimental Comparison GFRSC with RSC and values of alpha.
FRSC Figs. 6 and 7 show that in most cases, the number
of rules and accuracy of RSC and FRSC increase
First of all, we would like to list the main differences among
significantly with the increment of noise, whereas
GFRS, RS, and FRS.
the number of rules and accuracy of GFRSC increase
1. FRS and GFRS can handle real-valued data sets, slightly. Except the data set “Diabetes,” other three
whereas RS can only deal with symbolic data sets. data sets share these patterns. This shows that FRSC
Therefore, RS need a preprocessing of discretization. and RSC are sensitive to noise, whereas GFRSC is
The selection of the discretization method affects less sensitive.
the performance of RSC. Generally, there are Fig. 6 shows that GFRSC finds a compacter rule
two types of discretization methods for RS. One is set than FRSC and RSC. It also shows that in most
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 635
TABLE 8
The Valuation of the Comparison of DTC and GFRSC
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
636 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
robust model because it is less sensitive to misclassification involutive iff NðNðxÞÞ ¼ x for all x 2 ½0; 1; it is called
and perturbation. These two classifiers suit different types weakly involutive iff NðNðxÞÞ x for all x 2 ½0; 1. The
of real applications. standard negator is defined as NS ðxÞ ¼ 1 x. Given a
negator N, T -norm T and T -conorm S are called dual w.r.t.
6 CONCLUSION AND FUTURE WORK N iff De Morgan laws are satisfied, i.e., SðNðxÞ; NðyÞÞ ¼
NðT ðx; yÞÞ and T ðNðxÞ; NðyÞÞ ¼ NðSðx; yÞÞ.
In this paper, we develop a rule-based classifier using a Given a lower semicontinuous triangular norm T , the
novel framework GFRS. We first propose a robust model of residuation implication, or called T -residuated implication
FRS (i.e., GFRS), and then, design a classifier building is a function # : ½0; 1
½0; 1 ! ½0; 1 that satisfies #ðx; yÞ ¼
method based on GFRS. The key idea of building the
supfzjz 2 ½0; 1; T ðx; zÞ yg for every x; y 2 ½0; 1. Two ex-
classifier is to keep the discernibility information in the
amples of T -residuated implications include the Godel
original data sets invariant. The proposed GFRS model, as
implication #M , based on
the foundation of classifier building, has a sound mathe-
matical rational. We, by using the strict mathematical
1; x y
reasoning, not only show the reasonability of the new TM : #M ¼ ;
y; x > y
concept “consistence degree,” but also design a discern-
ibility vector. This is the main contribution of this paper. the Lukasiewicz implication #L , based on TL : #TL ¼
One advantage of this paper is that the classifier GFRSC is minf1 x þ y; 1g.
robust since GFRS is a robust with respect to misclassifica- Given a upper semicontinuous triangular conorm S, the
tion and perturbation. The main advantage of GFRS is that dual of T residuated implication w.r.t. N is a function :
knowledge representation and knowledge discovery are ½0; 1
½0; 1 ! ½0; 1 that satisfies ðx; yÞ ¼ inffz j z 2 ½0; 1;
closely related. That is, the fuzzy lower approximation
Sðx; zÞ yg for every x; y 2 ½0; 1. ðNðxÞ; NðyÞÞ ¼ Nð#ðx; yÞÞ
operator (i.e., consistence degree) is used in attribute value
and #ðNðxÞ; NðyÞÞ ¼ Nððx; yÞÞ hold for an involutive nega-
reduction. This point distinguishes GFRSC from other
classifiers built by using FRS. tor N. Examples of the dual of T -residuated implication w.r.t.
In the future, we would like to do some more work on N include the dual of Godel implication M , based on
GFRS. One of our future work is to give a more detailed
0; x y
discussion on the parameter in GFRS. Now, one guideline SM : M ¼ ;
y; x < y
of specifying this parameter is that in a certain reasonable
extent, the larger the parameter in GFRSC is, the compacter the dual of Lukasiewicz implication L , based on SL :
the obtained rule set is. However, it is difficult to specify the L ¼ maxfx þ y; 0g.
extent of its parameter.
A discussion on GFRS and probability theory is our
another future work. Our proposed GFRS model intends to ACKNOWLEDGMENTS
solve the problem about data perturbation. If data pertur- This work was supported in part by the Hong Kong RGC
bation is due to randomness, then another formal approach CERG research grants: PolyU 5273/05E (B-Q943) and PolyU
may be assumed that fuzzy data are obtained under a
5281/07E (B-Q06C). Chen Degang is supported by the NSFC
randomness process. Therefore, a framework based on the
notion of random fuzzy-rough set may be developed. (70871036). This work is also partially supported by the NSF
of Hebei Province (F2008000635, 08963522D, 06213548).
APPENDIX
Here, we present and exemplify some notions of fuzzy
REFERENCES
logical operators [4], [10], [11], [55] that are triangular norm, [1] Z. Pawlak and A. Skowron, “Rough Sets: Some Extensions,”
Information Sciences, vol. 177, pp. 28-40, 2007.
triangular conorm, negator, dual, T -residuated implication, [2] L.A. Zadeh, “Fuzzy Sets,” Information and Control, vol. 8, pp. 338-
and its dual operation. 353, 1965.
A triangular norm, or shortly T -norm, is a function T : [3] D. Dubois and H. Prade, “Rough Fuzzy Sets and Fuzzy Rough
½0; 1
½0; 1 ! ½0; 1 that satisfies the following conditions: Sets,” Int’l J. General Systems, vol. 17, pp. 191-208, 1990.
[4] D. Dubois and H. Prade, “Putting Rough Sets and Fuzzy Sets
monotonicity (T ðx; yÞ T ð; Þ); commutativity: (T ðx; yÞ ¼ Together,” Intelligent Decision Support, Handbook of Applications and
T ðy; xÞ); associativity: (T ðT ðx; yÞ; zÞ ¼ T ðx; T ðy; zÞÞ); and Advances of the Rough Sets Theory, R. Slowinski, ed., pp. 203-232,
boundary condition: (T ðx; 1Þ ¼ x). The most popular contin- Kluwer Academic Publishers, 1992.
uous T -norms include the standard min operator (the largest [5] J.S. Mi and W.X. Zhang, “An Axiomatic Characterization of a
Fuzzy Generalization of Rough Sets,” Information Sciences, vol. 160,
T -norm) TM ðx; yÞ ¼ minfx; yg; the bounded intersection (also pp. 235-249, 2004.
called the Lukasiewicz T -norm) TL ðx; yÞ ¼ maxf0; x þ y1g. [6] W.Z. Wu, J.S. Mi, and W.X. Zhang, “Generalized Fuzzy Rough
A triangular conorm, or shortly T -conorm, is an Sets,” Information Sciences, vol. 151, pp. 263-282, 2003.
increasing, commutative, and associative function S : [7] W.Z. Wu and W.X. Zhang, “Constructive and Axiomatic
Approaches of Fuzzy Approximation Operators,” Information
½0; 1
½0; 1 ! ½0; 1 that satisfies the boundary condition: Sciences, vol. 159, pp. 233-254, 2004.
8x 2 ½0; 1; Sðx; 0Þ ¼ x. The well-known continuous T - [8] D.G. Chen, X.Z. Wang, D.S. Yeung, and E.C.C. Tsang, “Rough
conorms include the standard max operator (the smallest Approximations on a Complete Completely Distributive Lattice
T -conorm) SM ðx; yÞ ¼ maxfx; yg; the bounded sum with Applications to Generalized Rough Sets,” Information
Sciences, vol. 176, pp. 1829-1848, 2006.
SL ðx; yÞ ¼ minf1; x þ yg. [9] D.S. Yeung, D.G. Chen, E.C.C. Tsang, J.W.T. Lee, and X.Z. Wang,
A negator N is a decreasing function N : ½0; 1 ! ½0; 1 “On the Generalization of Fuzzy Rough Sets,” IEEE Trans. Fuzzy
that satisfies Nð0Þ ¼ 1 and Nð1Þ ¼ 0. A negator N is called Systems, vol. 13, no. 3, pp. 343-361, June 2005.
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
ZHAO ET AL.: BUILDING A RULE-BASED CLASSIFIER—A FUZZY-ROUGH SET APPROACH 637
[10] N.N. Morsi and M.M Yakout, “Axiomatics for Fuzzy Rough Sets,” [34] R.B. Bhatt and M. Gopal, “On Fuzzy Rough Sets Approach to
Fuzzy Sets and Systems, vol. 100, pp. 327-342, 1998. Feature Selection,” Pattern Recognition Letters, vol. 26, pp. 1632-
[11] A.M. Radzikowska and E.E. Kerre, “A Comparative Study of 1640, 2005.
Fuzzy Rough Sets,” Fuzzy Sets and Systems, vol. 126, pp. 137-155, [35] Q. Shen and A. Chouchoulas, “A Rough-Fuzzy Approach for
2002. Generating Classification Rules Pattern Recognition,” Pattern
[12] M.D. Cock, C. Cornelis, and E.E. Kerre, “Fuzzy Rough Sets: The Recognition, vol. 35, no. 11, pp. 2425-2438, 2002.
Forgotten Step,” IEEE Trans. Fuzzy Systems, vol. 15, no. 1, pp. 121- [36] S.K. Pal and P. Mitra, “Case Generation Using Rough Sets with
130, Feb. 2007. Fuzzy Representation,” IEEE Trans. Knowledge and Data Eng.,
[13] C. Cornelis, M.D. Cock, and A.M. Radzikowska, “Fuzzy Rough vol. 16, no. 3, pp. 292-300, Mar. 2004.
Sets: From Theory into Practice,” Handbook of Granular Computing, [37] Y. Li, S.C.K. Shiu, and S.K. Pal, “Combining Feature Reduction
W. Pedrycz, A. Skowron, and V. Kreinovich, eds., Springer- and Case Selection in Building CBR Classifiers,” IEEE Trans.
Verlag, 2008. Knowledge and Data Eng., vol. 18, no. 2, pp. 415-429, Mar. 2006.
[38] Q.H. Hu, D.R. Yu, and Z.X. Xie, “Neighborhood Classifiers,”
[14] R. Slowinski and D. Vanderpooten, “A Generalized Definition of
Expert Systems with Applications, vol. 34, pp. 866-876, 2008.
Rough Approximations Based on Similarity,” IEEE Trans. Knowl-
[39] G.Q. Cao, S.C.K. Shiu, and X.Z. Wang, “A Fuzzy-Rough Approach
edge and Data Eng., vol. 12, no. 2, pp. 331-336, Mar./Apr. 2000.
for the Maintenance of Distributed Case-Based-Reasoning Sys-
[15] J.M.F. Salido and S. Murakami, “Rough Set Analysis of a General tems,” Soft Computing, vol. 7, no. 8, pp. 491-499, 2003.
Type of Fuzzy Data Using Transitive Aggregations of Fuzzy [40] M. Sarkar, “Fuzzy-Rough Nearest Neighbor Algorithms in
Similarity Relations,” Fuzzy Sets and Systems, vol. 139, pp. 635-660, Classification,” Fuzzy Sets and Systems, vol. 158, no. 19, pp. 2134-
2003. 2152, 2007.
[16] Y.Y. Yao, “Combination of Rough and Fuzzy Sets Based on Alpha- [41] R.B. Bhatt and M. Gopal, “FRCT: Fuzzy-Rough Classification
Level Sets,” Rough Sets and Data Mining: Analysis for Imprecise Data, Trees,” Pattern Analysis and Applications, vol. 11, no. 1, pp. 73-88,
T.Y. LIN and N. Cercone, eds., pp. 301-321, Kluwer Academic 2008.
Publishers, 1997. [42] R. Yasdi, “Learning Classification Rules from Database in the
[17] Q. Hu, Y. Daren, and X. Zongxia, “Fuzzy Probabilistic Approx- Context of Knowledge Acquisition and Representation,” IEEE
imation Spaces and Their Information Measures,” IEEE Trans. Trans. Knowledge and Data Eng., vol. 3, no. 3, pp. 293-306,
Fuzzy Systems, vol. 14, no. 2, pp. 191-201, Apr. 2006. Sept. 1991.
[18] R. Jensen and Q. Shen, “Fuzzy-Rough Attributes Reduction [43] Y.C. Tsai, C.H. Cheng, and J.R. Chang, “Entropy-Based Fuzzy
with Application to Web Categorization,” Fuzzy Sets and Rough Classification Approach for Extracting Classification
Systems, vol. 141, pp. 469-485, 2004. Rules,” Expert Systems with Applications, vol. 31, no. 2, pp. 436-
[19] Q. Shen and R. Jensen, “Selecting Informative Features with 443, 2006.
Fuzzy-Rough Sets and Its Application for Complex Systems [44] E.C.C. Tsang and S.Y. Zhao, “Decision Table Reduction in KDD:
Monitoring,” Pattern Recognition, vol. 37, pp. 1351-1363, 2004. Fuzzy Rough Based Approach, Transaction on Rough Sets,”
[20] R. Jensen and Q. Shen, “Semantics-Preserving Dimensionality Lecture Notes in Computer Sciences, vol. 5946, pp. 177-188, 2010.
Reduction: Rough and Fuzzy-Rough-Based Approaches,” IEEE [45] X.Z. Wang, E.C.C. Tsang, S.Y. Zhao, D.G. Chen, and D.S. Yeung,
Trans. Knowledge and Data Eng., vol. 16, no. 12, pp. 1457-1471, Dec. “Learning Fuzzy Rules from Fuzzy Samples Based on Rough Set
2004. Technique,” Information Sciences, vol. 177, no. 20, pp. 4493-4514,
[21] R. Jensen and Q. Shen, “Fuzzy-Rough Sets Assisted Attribute 2007.
Selection,” IEEE Trans. Fuzzy Systems, vol. 15, no. 1, pp. 73-89, [46] S.Y. Zhao, E.C.C. Tsang, and D.G. Chen, “The Model of Fuzzy
Feb. 2007. Variable Precision Rough Sets,” IEEE Trans. Fuzzy Systems, vol. 17,
no. 2, pp. 451-467, Apr. 2009.
[22] R. Jensen and Q. Shen, “New Approaches to Fuzzy-Rough Feature
[47] A. Skowron and C. Rauszer, “The Discernibility Matrices and
Selection,” IEEE Trans. Fuzzy Systems, vol. 17, no. 4, pp. 824-838,
Functions in Information Systems,” Intelligent Decision Sup-
Aug. 2009.
port—Handbook of Applications and Advances of the Rough Sets
[23] E.C.C. Tsang, D.G. Chen, D.S. Yeung, X.Z. Wang, and J.W.T. Lee, Theory, R. Slowinski, ed., pp. 331-362, Kluwer Academic
“Attribute Reduction Using Fuzzy Rough Sets,” IEEE Trans. Fuzzy Publishers, 1992.
Systems, vol. 16, no. 5, pp. 1130-1141, 2008. [48] Y.Y. Yao, Y. Zhao, and J. Wang, “On Reduct Construction
[24] D.G. Chen, E.C.C. Tsang, and S.Y. Zhao, “Attributes Reduction Algorithms,” Proc. First Int’l Conf. Rough Sets and Knowledge
with Fuzzy Rough Sets,” Proc. 2007 IEEE Int’l Conf. Systems, Man, Technology (RSKT ’06), pp. 297-304, July 2006.
and Cybernetics, vol. 1, pp. 486-491, 2007. [49] http://www.ics.uci.edu/mlearn/MLRepository.html, 2008.
[25] Q.H. Hu, D.R. Yu, and Z.X. Xie, “Information-Preserving Hybrid [50] S.R. Safavian and D. Landgrebe, “A Survey of Decision Tree
Data Reduction Based on Fuzzy-Rough Techniques,” Pattern Classifier Methodology,” IEEE Trans. Systems, Man Cybernetics,
Recognition Letters, vol. 27, pp. 414-423, 2006. vol. 21, no. 3, pp. 660-674, May/June 1991.
[26] Q.H. Hu, D.R. Yu, Z.X. Xie, and X.D. Li, “EROS: Ensemble Rough [51] Y.F. Yuan and M.J. Shaw, “Introduction of Fuzzy Decision Tree,”
Subspaces,” Pattern Recognition, vol. 40, pp. 3728-3739, 2007. Fuzzy Sets and Systems, vol. 69, pp. 125-139, 1995.
[27] Q.H. Hu, Z.X. Xie, and D.R. Yu, “Hybrid Attribute Reduction [52] J.R. Quinlan, See5, Release 1.13, http://www.rulequest.com/,
Based on a Novel Fuzzy-Rough Model and Information Granula- 2009.
tion,” Pattern Recognition, vol. 40, pp. 3509-3521, 2007. [53] J.C. Bezdek and J.O. Harris, “Fuzzy Partitions and Relations: An
[28] D. Slezak, “Approximate Reducts in Decision Tables,” Proc. Sixth Axiomatic Basis of Clustering,” Fuzzy Sets and Systems, vol. 84,
Int’l Conf. Information Processing and Management of Uncertainty in pp. 143-153, 1996.
Knowledge-Based Systems, pp. 1159-1164, 1996. [54] T. Sudkamp, “Similarity, Interpolation, and Fuzzy Rule Construc-
[29] D. Slezak, “Approximate Entropy Reducts,” Fundamenta Informa- tion,” Fuzzy Sets and Systems, vol. 58, pp. 73-86, 1993.
ticae, vol. 53, pp. 365-390, 2002. [55] G.J. Klir and B. Yuan, Fuzzy Logic: Theory and Applications.
Prentice-Hall, 1995.
[30] H.S. Nguyen and D. Slezak, “Approximate Reducts and
[56] W. Zhu and F.Y. Wang, “On Three Types of Covering-Based
Association Rules—Corresponding and Complexity Results,”
Rough Sets,” IEEE Trans. Knowledge and Data Eng., vol. 19, no. 8,
Proc. Int’l Workshop New Directions in Rough Sets, Data Mining,
pp. 1131-1144, Aug. 2007.
and Granular-Soft Computing (RSFDGrC ’99), N. Zhong,
[57] S.H. Nguyen and H.S. Nguyen, “Some Effective Algorithms for
A. Skowron, and S. Ohsuga, eds., pp. 137-145, 1999.
Rough Set Methods,” Proc. Conf. Int’l Processing and Management of
[31] C. Cornelis, G.H. Martı́n, R. Jensen, and D. Slezak, “Feature Uncertainty in Knowledge Based Systems, pp. 1451-1456, 1996.
Selection with Fuzzy Decision Reducts,” Proc. Third Int’l Conf.
Rough Sets and Knowledge Technology (RSKT ’08), 2008.
[32] E.C.C. Tsang, S.Y. Zhao, and J.W.T. Lee, “Rule Induction Based on
Fuzzy Rough Sets,” Proc. 2007 Int’l Conf. Machine Learning and
Cybernetics, vol. 5, pp. 3028-3033, Aug. 2007.
[33] C. Cornelis and R. Jensen, “A Noise-Tolerant Approach to Fuzzy-
Rough Feature Selection,” Proc. 17th Int’l Conf. Fuzzy System
(FUZZ-IEEE ’08), 2008.
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.
638 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 5, MAY 2010
Suyun Zhao (M’06) received the bachelor’s and XiZhao Wang received the PhD degree in
master’s degrees from the School of Mathe- computer science from Harbin Institute of
matics and Computer Science, Hebei University, Technology, China, in 1998. He is presently
Baoding, China, in 2001 and 2004, respectively. the dean and a professor of the College of
She is currently working toward the PhD degree Mathematics and Computer Science, Hebei
in the Department of Computing, The Hong University, China. From September 1998 to
Kong Polytechnic University. Her research inter- September 2001, he served as a research fellow
ests are in the areas of machine learning, in the Department of Computing, The Hong
pattern recognition, fuzzy sets, and rough sets. Kong Polytechnic University. He became a full
She is a student member of the IEEE. professor and the dean of the College of
Mathematics and Computer Science at Hebei University in October
2001. His main research interests include learning from examples with
Eric C.C. Tsang received the BSc degree in fuzzy representation, fuzzy measures and integrals, neurofuzzy systems
computer studies from the City University of and genetic algorithms, feature extraction, multiclassifier fusion, and
Hong Kong, Kowloon, in 1990, and the PhD applications of machine learning. He has more than 160 publications
degree in computing from The Hong Kong including four books, seven book chapters, and more than 90 journal
Polytechnic University, in 1996. He is an papers in the IEEE Transactions on Pattern Analysis and Machine
assistant professor in the Department of Com- Intelligence, the IEEE Transactions on Systems, Man, and Cybernetics,
puting, The Hong Kong Polytechnic University. the IEEE Transactions on Fuzzy Systems, Fuzzy Sets and Systems,
His main research interests are in the areas of Pattern Recognition, etc. He has been the PI/Co-PI for 16 research
fuzzy expert systems, fuzzy neural networks, projects supported partially by the National Natural Science Foundation
machine learning, genetic algorithms, and fuzzy of China and the Research Grant Committee of Hong Kong Govern-
support vector machine. ment. He is an IEEE senior member (Board of Governor member in
2005, 2007-2009), the chair of the IEEE SMC Technical Committee on
Degang Chen received the MS degree from the Computational Intelligence, an associate editor of the IEEE Transac-
Northeast Normal University, Changchun, Jilin, tions on SMC, Part B, an associate editor of the Pattern Recognition and
China, in 1994, and the PhD degree from Harbin Artificial Intelligence, a member of the editorial board of Information
Institute of Technology, China, in 2000. He has Sciences, and an executive member of the Chinese Association of
worked as a postdoctoral fellow with Xi’an Artificial Intelligence. He was the recipient of the IEEE-SMCS Out-
Jiaotong University, China, from 2000 to 2002 standing Contribution Award in 2004 and the IEEE-SMCS Best
and with Tsinghua University, China, from 2002 Associate Editor Award in 2006. He is the general cochair of the
to 2004. Since 2006, he has been working as a 2002-2009 International Conferences on Machine Learning and
professor at North China Electric Power Uni- Cybernetics, cosponsored by the IEEE SMCS. He is a distinguished
versity, Beijing, China. His research interests lecturer of the IEEE SMC Society.
include fuzzy group, fuzzy analysis, rough sets, and SVM.
Authorized licensed use limited to: GS INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on July 22,2010 at 11:40:55 UTC from IEEE Xplore. Restrictions apply.