Professional Documents
Culture Documents
The Author 2012. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
For Permissions, please email: journals.permissions@oup.com
doi:10.1093/comjnl/bxs001
Intelligence Group, School of Computer Science, Indian Institute of Information Technology and
Management, Kerala, India
2 Queensland Micro- and Nanotechnology Centre and Griffith School of Engineering, Griffith University,
Nathan, Australia
Corresponding author: apj@ieee.org
1.
INTRODUCTION
High feature dimensionality of realistic datasets adversely affects the recognition accuracy of nearest
neighbor (NN) classifiers. To address this issue, we introduce a nearest feature classifier that shifts
the NN concept from the global-decision level to the level of individual features. Performance
comparisons with 12 instance-based classifiers on 13 benchmark University of California Irvine
classification datasets show average improvements of 6 and 3.5% in recognition accuracy and
area under curve performance measures, respectively. The statistical significance of the observed
performance improvements is verified by the Friedman test and by the post hoc BonferroniDunn
test. In addition, the application of the classifier is demonstrated on face recognition databases, a
character recognition database and medical diagnosis problems for binary and multi-class diagnosis
on databases including morphological and gene expression features.
2.
RELATED WORK
3.
3.1.
The idea of NFs is implemented as a conversion of the featureto-feature distances (di ) to NF decisions, labeled as NF votes
(vi ) in Table 1. This conversion requires threshold values i ,
which are obtained from the standard deviations of feature-tofeature inter-class distances.
Consider a feature vector G = {gi , i [1, M], M
Integer} in a training set having a class label c and feature
number i. We can define the inter-class distances of a feature
TABLE 1. The proposed nearest feature vote calculation to identify the class of unknown object Z from Setosa, Versicolor and Virginica classes
in the Iris plant database.
Feature-to-feature
distances (cm)
Class label
g1
g2
g3
g4
d1
d2
d3
d4
1 (0.4 cm)
2 (0.4 cm)
3 (1.4 cm)
4 (0.6 cm)
Object similarity
score
Setosa S1
Setosa S2
5.1
4.9
3.5
3.0
1.4
1.4
0.2
0.2
1.4
1.6
0.5
0.0
4.4
4.4
2.0
2.0
0
0
0
1
0
0
0
0
0
1
Versicolor VC1
Versicolor VC2
Versicolor VC3
52
5.9
5.2
3.5
3.0
2.7
1.0
4.2
3.9
1.5
1.5
1.4
1.0
0.6
1.3
2.3
0.0
0.3
1.2
1.6
1.9
0.0
0.7
0.8
0
0
0
0
1
1
0
0
0
0
0
0
0
1
1
Virginica V1
Virginica V2
6.3
5.8
3.3
2.7
6.0
5.1
2.5
1.9
0.2
0.7
0.3
0.3
0.2
0.7
0.3
0.3
1
0
1
1
1
1
1
1
4
3
Unknown Z
6.5
3.0
5.8
2.2
NF votes, vi (w = 1)
(1)
1
0
if di i ,
otherwise.
(2)
To make the global decision about the class of the test object
Z, the top-rank method is attempted in the first instance. In
this method, the object with the highest similarity score in each
class (the top-ranked object in each class) is taken to represent
the whole class. This means that the global similarity scores
of the top-ranked objects in each class are taken as the values
of the cumulative class similarity scores, sg (c), shown in the
rank-1 columns in Table 2.
These similarity scores can be normalized into class
likelihood similarity scores, sn , by dividing with the maximum
similarity score across the classes (Table 2). The decision on
identifying the class c of the test object Z can be obtained by
finding the class that has the maximum value of sn as follows:
Global decisions
The votes representing the NFs for each object are added up
to obtain the global similarity score for each of the objects, as
illustrated in the column labeled as object similarity score in
Table 1.
(3)
FIGURE 1. Illustrations of the feature-to-feature comparisons between the test and gallery objects shown in Table 1 for the four features from the
Iris plant database: (a) sepal height, (b) sepal width, (c) petal height and (d) petal width. The NFs are those that fall in the shaded regions, defined
by the threshold distances i .
Rank
Rank
Class, c
Setosa
Versicolor
Virginica
1
4
2
4
4
4
3.4.
Method summary
= 0.25
= 0.25
= 1.00
1
7
2
7
7
7
3
= 0.14
= 0.28
2
7
= 1.00
= 0.28
Algorithm 1 NF classifier
Training Stage
Requires: A training set with M features per object, G =
{gi , i [1, M]} (there is a total of c classes and a maximum
of N objects per class).
Steps:
1. Calculate the inter-class feature-to-feature distances di =
where the gi (c)
belongs to a feature from any
|gi (c) gi (c)|,
class other than that from c.
2. Calculate the standard deviation i from the histogram
distribution of di .
3. Determine a threshold i = wi , where w is found
by maximization of inter-class and intra-class distance
distributions originating from the training set.
Testing Stage
Requires: A training set with M features per object, G =
{gi , i [1, M]} (there is a total of c classes, and a maximum
of N objects per class); a test object Z, whose class is
unknown.
Steps:
1. Calculate the feature-to-feature distance di = |gi (c)
gi (Z)|, where the gi (Z) is the i-th feature from object Z and
gi (c) is a feature from an object belonging to class c.
2. Find the nearest features by selecting those features whose
distances di fall within the threshold i . Each nearest feature
of an object is assigned a vote vi = 1.
3. Object similarity scores for each training object are
calculated by counting the total number of votes vi = 1 (the
nearest features) associated with the object.
4. The object similarity scores of m top ranked training
objects in each class are added to obtain the cumulative class
similarity score for the m-th rank.
5. The cumulative class similarity scores are normalized by
dividing with the rank-wise maximum score across classes to
obtain the class likelihood scores sn . The class of the training
object that has the highest score sn at top rank (m = 1)
is assigned as the class of the object Z. In the event of
inconclusive decision for m = 1, sn for m = 2 is used for the
decision.
the classes (Table 2). So, if a prediction results in a tie after the
top rank classification, then sn values from the next lower rank
can be added to represent the score for the class. By adopting
lower ranks in score calculations, the indecisiveness of class
selection resulting from ties can be resolved. Clearly, including
lower ranks in this manner makes the global decision of the NN
non-parametric. Furthermore, the maximum number of ranks
that are available for a classification problem will be limited by
the number of features in an object. This is because sn values
originate from the integer scores that are limited by the total
number of features in an object.
The use of classical majority voting [1, 2] can further improve
the global decisions when the number of objects per class in the
gallery is more than one. It should be noted that majority voting
is only useful when there are one or more intra-class gallery
objects that have an equal number of features with near-zero
feature-to-feature distances. In classification problems, such
situations occur more frequently for a large number of gallery
objects with low-dimensional feature vectors and less frequently
for a small number of gallery objects with high-dimensional
feature vectors.
FIGURE 2. (a) Inter-class and intra-class distributions of the normalized distances for a training set of Iris database; (b) the product
of inter- and intra-class distributions, illustrating that the normalized
distance corresponding to the maximum value of this product is automatically selected as the normalized threshold, or weight parameter,
w; (c) recognition accuracies for different values of w in comparison
with the recognition accuracy of the NN classifier.
86.7(1.7)
95.6(0.8)
86.4(1.8)
75.5(1.5)
93.4(4.1)
68.2(4.1)
80.4(2.8)
70.3(2.3)
95.3(1.7)
80.9(3.8)
95.1(1.3)
90.8(2.3)
76.4(4.4)
82.7(3.1)
95.3(1.2)
85.7(1.2)
76.2(2.1)
42.2(1.2)
71.0(3.8)
79.1(3.5)
72.1(1.8)
94.8(2.3)
80.8(3.7)
95.3(1.2)
91.0(3.0)
76.3(5.3)
74.6(3.5)
94.6(1.1)
85.6(1.8)
75.6(2.0)
60.3(1.3)
44.8(1.5)
80.7(2.5)
40.3(0.7)
94.7(2.4)
80.8(3.1)
95.3(1.2)
90.2(2.7)
74.9(4.5)
79.1(2.9)
95.9(1.0)
82.7(2.5)
72.7(2.4)
94.2(2.9)
65.7(4.8)
76.5(3.5)
60.3(3.8)
95.8(1.8)
81.3(3.5)
94.8(1.5)
88.8(2.8)
68.1(6.6)
Bagging
Adaboost
NNge
VFI
LWL
K
NN
Dk-NN
Balance (4)
89.6(2.2) 89.2(1.3) 89.2(1.1) 89.2(1.0) 78.5(2.5) 87.5(1.3) 62.6(2.7) 68.5(15.7)
Breast (9)
97.2(0.9) 96.1(0.8) 96.2(0.8) 96.2(0.8) 95.8(0.8) 94.9(1.2) 92.0(1.8) 95.3(1.4)
Credit (15)
87.4(1.8) 85.8(1.7) 85.7(1.3) 86.2(1.4) 81.3(1.8) 78.6(1.9) 86.1(1.3) 85.2(1.9)
Pima (8)
77.7(1.8) 73.6(2.0) 74.3(2.1) 73.9(2.2) 70.4(2.2) 70.2(2.3) 71.9(3.5) 54.8(11.4)
Zoo (17)
96.9(4.9) 94.3(3.4) 94.5(3.8) 94.4(3.9) 95.4(3.2) 96.7(2.5) 86.3(2.8) 94.8(2.4)
Glass (9)
70.7(3.3) 66.3(4.2) 66.8(5.0) 66.5(5.8) 67.5(5.3) 73.1(4.0) 45.8(2.5) 55.6(6.0)
Statlog (13)
85.8(2.9) 79.7(3.4) 79.5(2.7) 80.4(2.1) 76.1(2.6) 75.7(2.4) 71.8(2.0) 77.9(2.3)
Vehicle (18)
84.7(1.6) 67.6(2.4) 69.3(2.2) 69.3(2.1) 68.7(1.9) 69.5(1.7) 45.2(2.1) 51.9(2.7)
Iris (4)
97.6(2.2) 95.3(2.3) 95.5(2.4) 95.3(2.2) 95.3(2.6) 94.3(2.3) 94.2(2.3) 95.8(2.7)
Hepatits (19)
86.9(4.3) 82.5(3.4) 82.2(3.1) 82.0(2.6) 80.3(4.0) 80.7(4.0) 79.9(4.8) 82.6(4.0)
Vote (16)
98.5(1.6) 92.2(2.1) 92.4(2.0) 92.4(2.0) 91.8(2.0) 92.5(1.7) 95.3(1.2) 91.7(1.7)
Ionosphere (34) 98.2(2.5) 88.8(1.4) 85.8(2.4) 85.7(2.4) 86.1(2.1) 82.3(1.9) 83.0(3.3) 92.8(2.4)
Sonar (60)
98.1(1.5) 84.3(4.0) 84.4(4.4) 84.4(4.4) 84.5(4.3) 83.8(4.4) 71.6(4.3) 59.7(5.2)
The number next to each dataset name represents the number of features in each feature vector within the database.
Ik-NN
5.1.1. Results
Tables 3 and 4 show the obtained recognition accuracies and
areas under the curve (AUC), respectively. The recognition
performance of the proposed NF classifier is better in most
cases, whereas the AUC values are comparable. The average
weight values for the threshold selection for each dataset are
shown in Fig. 3. The wide variation in the weight values is due
to the variability of the data in the databases.
k-NN
NF
5.1.
Databases
5.
TABLE 3. Recognition accuracies of the proposed method in comparison with 11 classifiers using 13 UCI databases.
LogitBoost
TABLE 4. AUC of the proposed method in comparison with 11 methods using 13 UCI datasets.
Ik-NN
Dk-NN
NN
LWL
VFI
NNge
Adaboost
Bagging
LogitBoost
0.90(0.01)
0.96(0.01)
0.84(0.01)
0.72(0.02)
0.99(0.01)
0.91(0.01)
0.82(0.03)
0.90(0.01)
0.99(0.01)
0.87(0.04)
0.99(0.01)
0.99(0.02)
0.98(0.02)
0.99(0.01)
0.99(0.01)
0.91(0.02)
0.78(0.03)
1.00(0.00)
0.80(0.07)
0.85(0.04)
0.75(0.07)
1.00(0.00)
0.78(0.06)
0.97(0.02)
0.86(0.02)
0.85(0.05)
0.99(0.01)
0.99(0.01)
0.91(0.01)
0.79(0.02)
1.00(0.00)
0.84(0.07)
0.86(0.04)
0.79(0.04)
1.00(0.00)
0.82(0.04)
0.98(0.01)
0.97(0.01)
0.91(0.03)
0.99(0.01)
0.99(0.01)
0.91(0.01)
0.79(0.02)
1.00(0.00)
0.81(0.07)
0.86(0.02)
0.79(0.04)
1.00(0.00)
0.82(0.04)
0.97(0.01)
0.97(0.01)
0.91(0.03)
0.86(0.02)
0.95(0.01)
0.81(0.02)
0.67(0.02)
1.00(0.00)
0.78(0.04)
0.76(0.03)
0.66(0.02)
1.00(0.00)
0.69(0.06)
0.92(0.02)
0.82(0.03)
0.84(0.04)
0.98(0.01)
0.99(0.00)
0.85(0.02)
0.73(0.02)
1.00(0.00)
0.91(0.03)
0.83(0.02)
0.78(0.02)
1.00(0.00)
0.81(0.06)
0.98(0.01)
0.95(0.02)
0.93(0.03)
0.83(0.03)
0.98(0.01)
0.92(0.01)
0.78(0.02)
1.00(0.02)
0.84(0.04)
0.86(0.02)
0.76(0.02)
1.00(0.00)
0.77(0.07)
0.98(0.01)
0.94(0.02)
0.81(0.05)
0.93(0.01)
0.99(0.01)
0.90(0.02)
0.59(0.03)
1.00(0.00)
0.78(0.06)
0.84(0.02)
0.74(0.03)
1.00(0.00)
0.85(0.06)
0.98(0.01)
0.96(0.01)
0.64(0.06)
0.86(0.03)
0.96(0.01)
0.82(0.03)
0.69(0.03)
0.99(0.01)
0.76(0.05)
0.76(0.04)
0.61(0.04)
1.00(0.00)
0.66(0.06)
0.95(0.01)
0.88(0.02)
0.68(0.06)
0.90(0.03)
0.99(0.00)
0.93(0.01)
0.80(0.02)
1.00(0.02)
0.71(0.02)
0.88(0.03)
0.66(0.02)
1.00(0.00)
0.81(0.05)
0.99(0.00)
0.94(0.02)
0.83(0.04)
0.95(0.02)
0.99(0.01)
0.91(0.02)
0.82(0.02)
0.51(0.01)
0.90(0.03)
0.87(0.02)
0.85(0.01)
1.00(0.00)
0.82(0.07)
0.98(0.01)
0.95(0.02)
0.84(0.05)
0.98(0.01)
0.99(0.00)
0.93(0.01)
0.81(0.02)
1.00(0.00)
0.87(0.03)
0.88(0.03)
0.83(0.02)
1.00(0.00)
0.81(0.05)
0.99(0.02)
0.95(0.02)
0.83(0.04)
5.2.
k-NN
Balance
Breast
Credit
Pima
Zoo
Glass
Statlog
Vehicle
Iris
Hepatits
Vote
Ionosphere
Sonar
NF
AUC
k-NN
Ik-NN
Dk-NN
NN
K
LWL
VFI
NNge
AdaBoost
Bagging
LogitBoost
Computational complexity
FIGURE 4. The ratio of the number of NFs and the total number of
features used for similarity calculation in 13 databases.
NF
AQK-k
Machete
Scythe
DANN
Parzen
Iris (4)
Sonar (60)
Vote (16)
Ionosphere (34)
Hepatitis (19)
2.4 2.2
5.5 3.7
6.8 4.2
6.0 3.5
6.9 3.8
6.4 3.4
1.9 1.5
15.7 2.8
26.6 3.7
22.4 4.5
13.3 3.7
16.1 3.5
1.5 1.6
6.0 2.2
6.3 2.7
5.9 2.8
5.4 1.9
9.3 2.4
1.8 2.5
12.0 1.8
19.4 2.9
12.8 2.6
12.6 2.3
11.0 2.7
13.1 4.3
14.3 2.3
18.6 4.0
17.9 4.3
15.1 4.2
19.4 4.5
Classifier
10
6.1.
Face recognition
n(i, j ) = a1 +
g(i, j ) g(i,
j)
.
b1 g (i, j )
(4)
In the past, it has been observed that the use of extracted features
in classifiers results in better recognition rates compared with
the use of raw features, such as pixels. However, a general
classifier should be able to utilize the features both in its
original form and an extracted form. Image classification is an
example of a problem where either raw or extracted features
can be employed. In this section, we select two representative
image-recognition examples to demonstrate the performance
of the proposed NF classifier: (1) face recognition using raw
features and (2) handwritten character recognition using raw
and extracted features.
11
FIGURE 6. Illustration of the effect of normalization performed by Equation (4): (a) the distribution of pixel intensities of a raw image is
approximately exponential; (b) the distribution of pixel intensities of the normalized image is approximately Gaussian. The normalization brings
out features that are otherwise not visible to the human eye.
Recognition
accuracy (%)
Database
Gallery
Test
NF
k-NN
AR [35]
PUT [36]
EyaleB [37]
7
22
9
19
11
576
94
90
100
84
83
96
Character recognition
to form the raw feature vectors. The transformations applied to the raw features
are Zernike moments, KarhunenLoeve features and Fourier descriptors. The
Zernike features consist of 47 rotationally invariant Zernike moments and 6
morphological features. KarhunenLoeve transformation results in 64 features
and Fourier transformation results in 76 two-dimensional shape descriptors.
FIGURE 5. Examples of face images from (a) AR database [35], (b) PUT database [36] and (c) EYaleB database [37].
12
Class types
Number of objects
per class
Number of
features
WBCD
Benign
Malignant
Benign
Malignant
Nephritis
No Nephritis
Inflammation
No inflammation
Normal
Tumor
Normal
Tumor
Acute myeloid
Acute lymphoblastic
Breast
Prostate
Lung
Colorectal
Lymphoma
Bladder
Melanoma
Uterus
Leukemia
Renal
Pancreas
Ovary
Mesothelioma
CNS
458
241
357
212
70
50
61
59
40
22
39
21
25
47
11
10
11
11
14
11
10
10
14
11
11
11
11
12
10
10
33
33
6
6
6
6
2000
2000
7129
7129
7129
7129
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
16 063
WDBC
Acute-1
Acute-2
Colon
TumorC
Leukemia
GCM
FIGURE 7. Examples of images used in the handwritten character
recognition experiment. There are 10 classes, each representing a
handwritten integer.
TABLE 8. Character recognition performance of the NF classifier.
Recognition
accuracy(%)
Feature type
Gallery
Test
NF
k-NN
Zernike
KL
Fourier
Pixels
100
100
100
100
100
100
100
100
93
95
96
98
76
93
82
96
Dataset
Dataset
WBCD [48]
WDBC [50]
Acute-1 [51]
Acute-2 [51]
Colon [52]
Leukemia [54]
GCM [55]
Recognition
accuracy (%)
Gallery/Test
NF
349
150
284
284
60
60
59
61
31
31
38
34
30
30
144
46
k-NN
97.8
96.3
97.1
91.6
100.0
92.9
100.0
93.2
84.1
74.7
97.1
82.8
77.2
55.1
54.3
41.6
TumorC [53]
13
14
8.
OVA detection
accuracy (%)
Average false
detection (%)
Breast
Prostate
Lung
Colorectal
Lymphoma
Bladder
Melanoma
Uterus
Leukemia
Renal
Pancreas
Ovary
Mesothelioma
CNS
67
100
100
100
100
100
100
100
100
100
100
100
100
100
8
43
0
19
17
21
7
32
0
34
21
40
31
0
CONCLUSIONS
ACKNOWLEDGEMENTS
The authors would like to acknowledge the anonymous
reviewers for their constructive comments, which have resulted
in significant improvements of the presented research.
REFERENCES
[1] Dasarathy, B. (1990) Nearest Neighbor Classification Techniques. IEEE Press, Hoboken, NJ.
[2] Duda, R.O., Hart, P.G. and Stork, D.E. (2001) Pattern
Classification. Wiley, New York.
Class types
[19] Shakhnarovich, G., Darrell, T. and Indyk, P. (2006) NearestNeighbor Methods in Learning and Vision: Theory and Practice.
MIT press, Cambridge.
[20] Short, R. and Fukunaga, K. (1984) The optimal distance measure
for nearest neighbor classification. IEEE Trans. Inf. Theory,
IT-27, 622627.
[21] Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R. and
Wu, A.Y. (1998) An optimal algorithm for approximate nearest
neighbor searching in fixed dimensions. J. ACM, 45, 891923.
[22] Kleinberg, J. (1997) TwoAlgorithms for Nearest Neighbor Search
in High Dimensions. Proc. 29th Annual ACM Symp. Theory of
Computing, El Paso, TX, USA, May 46, pp. 599608. ACM.
[23] Indyk, P. and Motwani, R. (1998) Approximate Nearest
Neighbors: Towards Removing the Curse of Dimensionality.
Proc. 30th Symp. Theory of Computing, Dallas, TX, USA, May
2426, pp. 604613. ACM.
[24] Akkus, A. and Guvenir, A.H. (1996) K Nearest Neighbor
Classification on Feature Projections. Proc. ICML, Bari, Italy,
July 36, pp. 1219. Morgan Kaufmann.
[25] Demirz, G. and Guvenir, A.H. (1997) Classification by Voting
Feature Intervals. Proc. ECML-97, Prague, Czech Republic,April
2325, pp. 8592. Springer.
[26] Aha, D. and Kibler, D. (1991) Instance-based learning algorithms.
J. Mach. Learn., 6, 3766.
[27] Cleary, J.G. and Trigg, L.E. (1995) K*: An Instance-Based
Learner Using an Entropic Distance Measure. Proc. 12th Int.
Conf. Machine Learning, Tahoe City, CA, USA, July 912, pp.
108114. Morgan Kaufmann.
[28] Frank, E., Hall, M. and Pfahringer, B. (2003) Locally Weighted
Naive Bayes. Proc. 19th Conf. Uncertainty in Artificial
Intelligence, Acapulco, Mexico, August 710, pp. 249256.
Morgan Kaufmann.
[29] Freund, Y. and Schapire, R.E. (1996) Experiments with a New
Boosting Algorithm. Proc. 13th Int. Conf. Machine Learning
ICML 1996, Bari, Italy, July 36, pp. 148156. Morgan
Kaufmann.
[30] Breiman, L. (1996) Bagging predictors. J. Mach. Learn., 24, 123
140.
[31] Friedman, J., Hastie, T. and Tibshirani, R. (2000)Additive logistic
regression: a statistical view of boosting. Ann. Stat., 28, 337407.
[32] Peng, J., Heisterkamp, D.R. and Dai, H.K. (2004) Adaptive
quasiconformal kernel nearest neighbor classification. IEEE
Trans. Pattern Anal. Mach. Intell., 26, 656661.
[33] Friedman, J.H. (1994) Flexible Metric Nearest Neighbour
Classification. Technical Report, Department of Statistics,
Stanford University.
[34] Hastie, T. and Tibshirani, R. (1996) Discriminant adaptive nearest
neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell.,
18, 607615.
[35] Martinez, A. and Benavente, R. (1998) The AR Face Database.
Technical Report 24. CVC Technical Report.
[36] Kasinski, A., Florek, A. and Schmidt, A. (2008) The put face
database. Image Process. Commun., 13, 5964.
[37] Georghiades, A.S., Belhumeur, P.N. and Kriegman, D.J. (2001)
From few to many: illumination cone models for face recognition
under variable lighting and pose. IEEE Trans. Pattern Anal. Mach.
Intell., 23, 643660.
15
16