1 s2.0 S0020025506003227 Main

Information Sciences 177 (2007) 18781891 www.elsevier.
com/locate/ins
Shape-based image retrieval using support vector machines, Fourier descriptors and self-organizing maps
Wai-Tak Wong a, Frank Y. Shih
a b
b,*
, Jung Liu
Department of Information Management, Chung Hua University, No. 707, Sec. 2, Wu Fu Road, Hsin Chu, Taiwan, ROC Computer Vision Laboratory, College of Computing Sciences, New Jersey Institute of Technology, Newark, NJ 07102, United States Received 26 September 2005; received in revised form 17 October 2006; accepted 26 October 2006
Abstract Image retrieval based on image content has become an important topic in the elds of image processing and computer vision. In this paper, we present a new method of shape-based image retrieval using support vector machines (SVM), Fourier descriptors and self-organizing maps. A list of predicted classes for an input shape is obtained using the SVM, ranked according to their estimated likelihood. The best match of the image to the top-ranked class is then chosen by the minimum mean square error. The nearest neighbors can be retrieved from the self-organizing map of the class. We employ three databases of 99, 216, and 1045 shapes for our experiment, and obtain prediction accuracy of 90%, 96.7%, and 84.2%, respectively. Our method outperforms some existing shape-based methods in terms of speed and accuracy. 2006 Elsevier Inc. All rights reserved.
Keywords: Image retrieval; Object recognition; Support vector machine; Self-organizing map; Fourier descriptor
1. Introduction Shape-based image retrieval of similar objects from image databases has been extensively studied [13,24,26,30]. Indexing is often used to search for a specic shape in a large database [4]. A simple approach uses the nearest neighbor in the database. This requires O(n) distance computations, where n is the number of items in the database. The approach becomes computationally costly if the database is large. Therefore, several improved nearest neighbor techniques have been proposed. For instance, in the spatial access method [16] each shape is represented by a nite set of labeled features. The distance between a pair of shapes is dened as the Euclidean distance between the moment invariant features [14] or between the Fourier descriptors [12]. The Euclidean distance is used to divide the input elements into spatial clusters, so the inecient search of distinct clusters in query can be avoided [6,7]. However, this method suers from the drawback that its computational complexity increases exponentially with the dimensionality of feature space.
*
Corresponding author. Tel.: +1 973 596 5654; fax: +1 973 596 5777. E-mail address: shih@njit.edu (F.Y. Shih).
0020-0255/$ - see front matter 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2006.10.008
W.-T. Wong et al. / Information Sciences 177 (2007) 18781891
1879
Two dimensional moment invariants for planar geometric gures have been investigated. Hu [9] introduced a set of moment invariants using a nonlinear combination based on normalized central moments. It is necessary to compute these invariants to engage all the pixels in the shape. Chen [2] proposed an improved moment invariant technique only based on boundary pixels to speed up the computation. In addition, Chens normalized moment invariants were designed to be invariant to scaling, translation and rotation [19]. The denitions of Hus and Chens moment invariants are briey introduced in Appendix A. Shape similarity measures are essential in pattern matching, where the strategy is to transform patterns and to measure their resemblance. The similarity of two patterns is measured as the distance between their feature vectors [15]. An extensive study of similarity measures using matching can be found in Ref. [23]. Methods have been developed to retrieve images using these measures [11,27]. Unfortunately, they are inecient for a large database. The view based similarity approach developed by Tangelder and Veltkamp [20] is robust and possesses high discriminative power compared to other shape matching methods, but requires a medium speed due to its lack of ecient indexing. For example, the shock graph matching method proposed by Sebastian et al. [18] is a view based similarity approach. Their results, derived from two databases of 99 and 216 shapes (Fig. 1), demonstrate a 100% recognition rate for the top three matches. However, this method is impractical for querying large databases due to the high computational cost of the deformation function between two shock graphs [17]. In this paper, we propose an ecient image retrieval algorithm for large scale shape databases. We use the Fourier descriptor (FD) as the similarity measure, the support vector machines (SVM) as the pattern classier, and the self-organizing map (SOM) as the nearest neighbor searching method. The overall shape based image retrieval algorithm is illustrated in Fig. 2. First, we characterize a set of 2D shapes by their FDs. Second, we build an image database containing the shape number, the class label, and the FDs. Third, we use the FDs with class labels to train the SVM. Fourth, we feed the input patterns into the SVM and generate the predicted classes. The SVM with multi-class probability can provide a list of predicted, ranked classes. Then we select the best matched image based on the least mean square error distance (LMSED) among the images of the top rank class. We provide the user an option to retrieve more matched images. If the predicted class is incorrect, we iterate on the next predicted class in the ranked list. Using this algorithm, we achieve nearly the same accuracy but much faster than the existing method in Ref. [18].
Fig. 1. The two databases of 99 and 216 shapes used by Sebastian [18].
1880
2D images
Obtain CeFDs Feature extraction stage
Training
Testing Generate SOM models for each class
Learning stage
SVM training Create SVM model Testing SVM model SOM model
Image retrieval stage Prior class list
Select class
SOM
Retrieve the SOM of the selected class
If the predicted class is unsatisfied
Retrieve the nearest neighbors
Database If the retrieved images are satisfied Stop Retrieve the best matched image
Fig. 2. Flowchart of the overall image retrieval algorithm.
The rest of this paper is organized as follows. In Section 2, we introduce Fourier descriptor and moment invariants. We describe the support vector machine and the self-organizing map, respectively, in Sections 3 and 4. We provide experimental results in Section 5. Finally, we draw conclusions in Section 6. 2. Fourier descriptor and moment invariants The shape signature is dened by a 1D continuous function f(t). The Fourier transform of f(t) is given by Eq. (1). When dealing with discrete images, we use the discrete Fourier transform (DFT). Assuming that the object contour is digitized into N points, the DFT of f(t) is given by Eq. (2), where the value of
1881
u{0, 1, . . . , N 1}. From Eulers equation, i.e., ej2put/N = cos (2put/N) j sin (2put/N), we obtain Eq. (3). The coecient F(u) is called a Fourier descriptor (FD). F u F u Z
1
f tej2put dt
1 2 3
X 1 N 1 f tej2put=N N t 0 X 1 N 1 F u f tcos2put=N j sin2put=N N t 0
Several shape signatures have been proposed to derive Fourier descriptors. Zhang and Lu [29] showed empirically that the Fourier descriptor derived from the centroid distance (CeFD) can achieve signicantly higher performance than other Fourier descriptors, such as the area FD, curvature FD, psi FD, position (complex) FD, ane FD, and chord-length FD. The centroid distance function f(t) is dened as the distance between the boundary points and the centroid (xc,yc), as in Eq. (4). Only half of the Fourier descriptors F(u) are used to index the corresponding shape because f(t) is a set of real values. Therefore, we obtain Eq. (5). The Fourier descriptors can be normalized to be independent of geometric translation, scaling and rotation. A set of invariant descriptors CeFD can be obtained by Eq. (6). Since F(0) is the largest coecient, we use it as the normalization factor. To reduce dimensionality, we choose the rst 1060 coecients for classication. f t q 2 2 x t xc y t y c v !2 !2 u N 1 N 1 u X X 1 t f t cos2put=N f t sin2put=N jF uj N t 0 t 0 CeFD jF uj ; jF 0j u 1; 2; . . . ; N =2 4 5 6
3. Support vector machines (SVM) The SVM, derived from the Vapniks structural risk minimization (SRM) principle [22], can reduce the empirical risk and quantities based on the bounds of the generalization error. The basic idea is to transform the training data into a higher dimensional space and then to nd the hyperplane that maximizes the margin between classes. The simplest model of the SVM classier is called the maximal margin classier. As shown in Eq. (7), the SVM classier attempts to place a linear plane between two classes and to orient this plane in such a way that the margin 2/kwk is maximized. The data points nearest to the margin are known as support vectors, which contain all the information needed for the classier. ) y i wT xi b P 1 for i 1; 2; . . . ; N and xi 2 A [ B 7 Minimizew;b 1 kwk2 2 When two classes cannot be completely separated, this approach is not feasible. Therefore, the slack variable n is introduced to control misclassication due to classes being linearly non-separable. As shown in Eq. (8), C is a parameter controlling the tradeo between the margin and classication errors in the training set. Another technique dealing with classes that are not linearly separable involves mapping the data into higher dimensions. In high dimensional space it is possible to create a hyperplane that allows linear separation corresponding to a curved surface in the lower-dimensional input space. However, transforming the data to a higher dimension may cause a loss of generality. Accordingly, the used kernel function, such as linear, polynomial or Gaussian, plays an important role for the class discrimination of the SVM.
1882
Fig. 3. Sebastians 1048-shape database with 42 categories. Note that the three shapes shown in reverse are omitted in our experiment.
1883
9 i 1; 2; . . . ; N and xi 2 A [ B > > > = N P 2 1 Minimizew;b 2 kwk C ni ; i 1; 2; . . . ; N > i 1 > > ; ni > 0 ; i 0 ; 1 ; 2 ; . . . ; N y i wT xi b P 1 ni ;
In real world applications there are more than two classes. The multi-class SVM is either trained with oneagainst-all or one-against-one scheme [8]. The one-against-all training method is easy to implement. It constructs k SVM models where k is the number of classes. The pth SVM classier is trained using all the elements
Fig. 3 (continued)
1884
in the pth class with positive labels and the other elements with negative labels. There are k decision functions to be solved. We say xi is in the class which has the maximum value of the decision function. This approach is computationally costly because k quadratic optimization problems (QP) with sample size l need to be solved. The one-against-one training method involves the construction of binary SVM classiers for all pairs of classes. This method constructs k(k 1)/2 classiers where each one is trained using data from two classes. This number is usually larger than the number of one-against-all classiers. For instance, if k = 10, we need to train 45 binary classiers compared to 10 using the one-against-all training method. The total training time seems to be longer, but actually the individual problems are smaller because on average each QP problem has about 2l/k variables. The decision function assigns an instance to a class which has the largest number of votes. The voting approach is called the Max Wins strategy. For image retrieval, it is desirable to use a special SVM classier with multi-class probability estimates [25]. When the rst predicted class is incorrect, the SVM classier proceeds to the second class. We adopt the LIBSVM [3] package because it uses the sequential minimal optimization (SMO) for the multi-class SVM. The LIBSVM package uses the one-against-one training method because its training time is shorter than the one-against-all training method with comparable performance. Moreover, the one-against-one training method provides probability estimates.
SVM Cross-validation Tests
98 Prediction Accuracy (%) 96 94 92 90 88 12 16 20 24 Number of CeFDs 28 32 99 shapes 216 shapes
Fig. 4. Prediction accuracy of dierent numbers of CeFDs in SVM cross-validation tests.
Table 1 Average prediction rates on test data using our method Database Best match (%) Ours *Sebastians Chens Second best match (%) Ours *Sebastians Chens Third best match (%) Ours *Sebastians Chens 99 Shapes 90.0 97.0 77.8 96.7 77.8 96.7 100 77.8 216 Shapes 96.7 96.0 77.8 98.9 83.3 100 100 91.7 1045 Shapes 84.1 67.8 94.2 81.0 96.3 83.9
Note. There are no results for the second best match and for the 1045 shape data set in Sebastians method.
1885
4. Self-organizing map (SOM) The Kohonen SOM [10] is an algorithm to visualize and interpret data by projecting it from a high dimensional space to a low dimensional space. Basically, it consists of a 2D lattice of nodes. Each node has an N dimensional feature vector that represents a point in the feature space. The map attempts to represent all the available observations. It loops through training data to nd the closest node from the map. The choice is based on the nearest neighbor principle for feature vectors associated with the nodes. As a result of training, the feature vectors of nearby nodes are located nearby in the high-dimensional data space. Thus, similar data elements are projected near each other in the SOM, while dissimilar elements are mapped to nodes that are distant from each other in the map. In our method, the SOM is used to index the images in a class. First, we apply the LMSED to the images in a class to nd the best match unit (BMU) for each image. The LMSED of an image Q is described by Eq. (9), where m is the number of CeFDs used and M is another image in the same class. s! m X 2 LMSEDQ min CeFDQi CeFDMi
i 1
Table 2 Prediction rates on the training and test data for all ve permutations Database Rotation First matching (%) Train 99 Shapes Round1 Round2 Round3 Round4 Round5 Average Round1 Round2 Round3 Round4 Round5 Average Round1 Round2 Round3 Round4 Round5 Average 100 100 100 100 100 100 100 100 100 100 100 100 96.28 96.88 96.88 96.28 96.88 96.64 Test 94.44 83.33 94.44 83.33 94.44 90.00 100 97.22 97.22 94.44 94.44 96.66 84.91 83.49 83.49 84.91 83.49 84.06 Second matching (%) Train 100 100 100 100 100 100 100 100 100 100 100 100 99.76 99.88 99.88 99.76 99.88 99.83 Test 94.44 88.89 100 100 100 96.67 100 100 100 97.22 97.22 98.89 93.87 94.34 94.34 93.87 94.34 94.15 Third matching (%) Train 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 Test 94.44 88.89 100 100 100 96.67 100 100 100 100 100 100.00 95.75 96.70 96.70 95.75 96.70 96.32
216 Shapes
1045 Shapes
100.0% 80.0% Precision 60.0% 40.0% 20.0% 0.0% 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Number of Classes

Fig. 5. Comparison of our method with Chens moment invariants.
SVM (ours) SVM (Chen's 4) SVM (Chen's 7)
1886
For training data, we generate one SOM for each class. We keep each image and its BMU as close as possible in the map. After training, we store the SOM in the database. In response to a query, we rst calculate the BMU of the new image in the predicted best class. If more images are requested, we search the neighborhood of the queried image from the map. There is no need to record all relationships among the data, so the processing time is reduced. 5. Experimental results To facilitate comparisons, we adopt the same dataset as in [18]. It contains 42 classes, each including 560 dierent shapes, except two classes, calf and donkey, with less than three shapes, which we exclude. Only two sets of 99 and 216 shapes are used in [18]; however, our experiment includes 1045 shapes, as shown in Fig. 3 (disregard the shapes in black background). We choose the probability estimate option of the LIBSVM package using the Gaussian kernel function k(x, y) = exp ((x y)2/d2) to obtain the rst three matching classes. It is because the Gaussian kernel function tends to have better performance than the polynomial or the sigmoid kernel function in image retrieval according to [21,28]. To achieve reliable classication in testing, we need a signicant amount of training data.
Fig. 6. SOM training results for textbox category.
1887
Therefore, we use 80% of the data for training and the remaining 20% for testing. To choose the best parameters, we apply 10-fold cross validation in the training data. We obtain d2{5, 4, 3, 2, 1} and C{1000, 500, 100, 50, 10}. To speed up the process, we nd six appropriate CeFDs based on the ranges of
Fig. 7. SOM training results for tool category.
Fig. 8. SOM training results for sh category.
1888
Table 3 The BMU of each image in textbox, tool, and sh Textbox Image textbox1 textbox3 textbox4 textbox5 textbox7 textbox8 textbox9 textbox10 textbox11 textbox12 textbox14 textbox16 textbox17 textbox18 textbox19 textbox20 textbox21 textbox22 textbox23 textbox24 textbox25 textbox26 textbox27 textbox29 textbox31 textbox32 textbox33 textbox34 textbox37 textbox39 textbox40 textbox41 textbox42 textbox43 textbox44 textbox45 textbox47 textbox48 textbox49 textbox50 textbox52 textbox54 textbox55 textbox56 textbox57 textbox58 textbox59 textbox60 BMU textbox25 textbox27 textbox41 textbox29 textbox31 textbox29 textbox10 textbox9 textbox9 textbox10 textbox13 textbox23 textbox11 textbox37 textbox44 textbox23 textbox45 textbox16 textbox20 textbox41 textbox27 textbox5 textbox3 textbox5 textbox7 textbox34 textbox34 textbox33 textbox43 textbox41 textbox45 textbox45 textbox43 textbox42 textbox43 textbox41 textbox48 textbox8 textbox59 textbox54 textbox57 textbox58 textbox49 textbox55 textbox52 textbox54 textbox49 textbox52 Tool Image tool02 tool03 tool04 tool05 tool07 tool08 tool09 tool10 tool11 tool12 tool13 tool14 tool17 tool18 tool19 tool20 tool21 tool22 tool23 tool24 tool25 tool28 tool29 tool30 tool31 tool32 tool33 tool34 tool35 tool38 tool39 tool40 tool41 BMU tool11 tool10 tool05 tool04 tool04 tool14 tool19 tool03 tool02 tool11 tool20 tool08 tool22 tool20 tool09 tool23 tool19 tool17 tool20 tool21 tool31 tool21 tool32 tool33 tool25 tool29 tool30 tool33 tool29 tool31 tool40 tool39 tool40 Fish Image sh1 sh2 sh3 sh4 sh5 sh6 sh7 sh8 sh9 sh10 sh11 sh12 sh14 sh15 sh16 sh17 sh18 sh20 sh22 sh23 sh24 sh25 sh26 sh28 sh30 sh31 sh32 sh33 sh34 sh36 sh37 sh38 sh40 sh41 sh42 sh43 sh44 sh46 sh47 sh49 sh50 sh52 sh53 sh54 sh55 sh57 sh58 sh59 BMU sh4 sh49 sh24 sh25 sh7 sh31 sh46 sh9 sh49 sh57 sh43 sh59 sh34 sh18 sh30 sh31 sh15 sh57 sh16 sh25 sh23 sh23 sh25 sh5 sh44 sh17 sh37 sh50 sh38 sh33 sh43 sh40 sh38 sh42 sh44 sh11 sh47 sh9 sh44 sh9 sh33 sh53 sh52 sh52 sh57 sh10 sh11 sh12
Table 4 Performance comparisons of SGI Indigo II, SGI Origin 2000, and Pentium 4 Testing case from reporter J.J. Dongarra A. Brian SGI Indigo II 195 MHz (MFLOPS/SEC) 56.3 SGI Origin 2000 195 MHz (MFLOPS/SEC) 91.4 356 Pentium 4 2.53 GHz (MFLOPS/SEC) 3210
1889
d2and C as: 12, 16, 20, 24, 28 and 32. We obtain the prediction rates that converge when six CeFDs are used with d2 = 5 and C = 1000, as shown in Fig. 4. Note that the recognition accuracy of the 99-shape database is slightly lower than the 216-shape database since about 10% of the 99 shapes are highly distorted objects. The result shows that the CeFD is a suitable shape signature in terms of robustness, computation complexity and convergence speed, as discussed by Zhang and Lu [29]. Similar to providing sucient amounts of training data, we use four groups for training and one group for testing among the ve groups of data. The role of each group for training and testing is permuted. A comparison of our method with Sebastians and Chens methods is summarized in Table 1. The prediction rates on the training and testing data for all ve permutations are listed in Table 2. We observe that the last three of Chens moment invariants are unstable and produce over-tting. The prediction accuracy in the cross validation tests for all the 1045 shapes is shown in Fig. 5. Apparently, the CeFD produces a 10% higher accuracy than Chens moment invariants. In all cases, we generated the SOMs successfully because each image is located in either the same node or its 5 5 neighborhood of its BMU. Due to space limitation, we only show a few SOM training results. Figs. 68, respectively, show the results of the SOM training on textbox, tool and sh, where the elements in the tables are located according to the mutual distance relationship of their shapes. Table 3 lists the BMU of each image in textbox, tool and sh. For example, the BMUs of images named textbox01, textbox03 and textbox05 are obtained to be textbox25, textbox27 and textbox29, respectively. Sebastian et al. [18], implementing their algorithm on an SGI Indigo II 195 MHz computer, spent 2 s to compare two shock-based graphs. They spent 180 and 300 s to retrieve the best matched image from the 99-shape and 216-shape databases, respectively. They spent a few hours to build the 216-shape database for image retrieval. In our experiment, we used an Intel Pentium 1.6 GHz PC and spent 1 s to classify 208 test data and less than 6 min to build the 1024-shape database. Since the computing platforms are dierent, we use MFLOPs (million oating-point operations) to performance comparisons. We list the performance comparisons of SGI Indigo II, SGI Origin 2000 and Pentium 4 in Table 4. From Dongarra [5], the SGI Origin 2000 (195 MHz) PC is about 1.62 times faster than the SGI Indigo II (195 MHz) PC, and from Brian [1], the Intel Pentium 1.6 GHz PC is about 9.02 times faster than the SGI Origin 2000 PC. Therefore, the Intel Pentium 1.6 GHz PC we used is about 15 times faster than the SGI Indigo II PC. However, the total time required by our algorithm is at least 150 times shorter than the time required by Sebastians algorithm. 6. Conclusions We have presented a novel approach to shape-based image retrieval using Fourier descriptors, support vector machines, and self-organizing maps. Unlike existing methods, we use the SVM with probability estimates to rst obtain the most likely classes and then retrieve similar images based on the best match from the SOM of the selected class. Unlike the shock graph matching approach, our method does not need to perform a linear search on all the images in the database. Experimental results show that our method outperforms some existing shape-based methods in terms of speed and accuracy. Acknowledgement This work is partially supported by the National Science Council, Taiwan, ROC, under the Grant NSC 942213-E-216-024. Appendix A Let f(x, y) be a continuous shape-based image function. The (p,q)th central moment of f(x, y) is dened as Z
1
lpq
1 1
x xc y y c f x; y dx dy
1890
where (xc, yc) are the coordinates of the centroid of the shape. The normalized (p, q)th central moment is dened as gpq lpq ; lc 00 where c pq2 and p; q 2; 3; . . . 2
Based on these moments, Hu [9] derived the following seven moment invariants, Ui, where i = 1, 2, . . . , 7. 9 U1 g20 g02 > > > > > U2 g20 g02 2 4g2 > 11 > > > 2 2 > U3 g30 3g12 g03 3g21 > > > > 2 2 > > U4 g30 g12 g03 g21 > = 2 2 U5 g30 3g12 g30 g12 fg30 g12 3g21 g03 g > > > > 3g21 g03 g21 g03 f3g30 g12 2 g21 g03 2 g > > > 2 2 > U6 g20 g02 fg30 g12 g21 g03 g 4g11 g30 g12 g21 g03 > > > > > 2 2 > > U7 3g21 g03 g30 g12 fg30 g12 3g21 g03 g > > ; 2 2 3g21 g30 g21 g03 f3g30 g12 g21 g03 g Chen [2] improved the central moments by applying the moment computation only on the boundary. Chens (p, q)th central moment is dened as Z p q lpq x xc y y c ds
S
For a digital shape-based image, the discrete form is described as X p q lpq x x y y

x;y 2S
Chens normalized (p, q)th central moment is dened as lpq gpq c ; where c p q 1 and p; q 2; 3; . . . l00 References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] A. Brian, Coral, 1999, <http://www.icase.edu/coral/Feb10.99.html>. C.-C. Chen, Improved moment invariants for shape discrimination, Patt. Recogn. 26 (5) (1993) 683686. C.-C. Chen, C.-J. Lin, LIBSVM A Library for Support Vector Machines, 2006. <http://www.csie.ntu.edu.tw/~cjlin/libsvm/>. S. Climer, S.K. Bhatia, Image database indexing using JPEG coecients, Patt. Recogn. 35 (11) (2002) 24792488. J. Dongarra, Performance of Various Computers in Computational Chemistry, 2005, <http://www.cfs.dl.ac.uk/benchmarks/ compchem.html>. C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petrkovic, W. Equitz, Ecient and eective querying by image content, Intell. Info. Syst. 3 (34) (1994) 231262. W. Groski, R. Mehrota, Index-based object recognition in pictorial data management, Comput. Vision Graph. Image Process. 52 (3) (1990) 416436. C.-W. Hsu, C.-J. Lin, A comparison of methods for multi-class support vector machines, IEEE Trans. Neural Networks 13 (2) (2002) 415425. M.-K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Inform. Theor. 8 (1962) 179187. T. Kohonen, Self-organization and Associative Memory, second ed., Springer, Berlin, 1988. L. Latecki, R. Laka mper, Application of planar shape comparison to object retrieval in image databases, Patt. Recogn. 35 (11) (2002) 1529. C. Lin, R. Chellappa, Classication of partial 2-D shapes using Fourier descriptors, IEEE Trans. Patt. Anal. Mach. Intel. 9 (5) (1987) 686690. H. Nishida, Structural feature indexing for retrieval of partially visible shapes, Patt. Recogn. 35 (1) (2002) 5567. E. Rivlin, I. Weiss, Local invariants for recognition, IEEE Trans. Patt. Anal. Mach. Intel. 17 (3) (1995) 226238. S. Santini, R. Jain, Similarity measures, IEEE Trans. Patt. Anal. Mach. Intel. 21 (9) (1999) 871882.
1891
[16] T. Sebastian, B. Kimia, Metric-based shape retrieval in large databases, in: Proceedings of the International Conference on Pattern Recognition, Quebec City, Canada, 2002, pp. 291296. [17] T. Sebastian, P. Klein, B. Kimia, Shock-based indexing into large shape databases, in: Proceedings of the Europe Conference on Computer Vision, Copenhagen, Denmark, 2002, pp. 731746. [18] T. Sebastian, P. Klein, B. Kimia, Recognition of shapes by editing their shock graphs, IEEE Trans. Patt. Anal. Mach. Intel. 26 (5) (2004) 550571. [19] Y. Sun, W. Liu, Y. Wang, United moment invariants for shape discrimination, in: Proceedings of the IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, Changsha, China, 2003, pp. 8893. [20] J. Tangelder, R. Veltkamp, A survey of content based 3D shape retrieval methods, in: Proceedings of the International Conference on Shape Modeling and Applications, Genova, Italy, 2004, pp.145156. [21] S. Tong, E. Chang, Support vector active learning for image retrieval, in: Proceedings of the Conference on ACM Multimedia, Ottawa, Canada, 2001, pp. 107118. [22] V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998. [23] R. Veltkamp, Shape matching: similarity measures and algorithms, in: Proceedings of the International Conference on Shape Modeling and Applications, Genova, Italy, 2001, pp. 188197. [24] Y. Wang, Image indexing and similarity retrieval based on spatial relationship model, Inform. Sci. 154 (12) (2003) 3958. [25] T.-F. Wu, C.-J. Lin, R-.C. Weng, Probability estimates for multi-class classication by pairwise coupling, Mach. Learn. Res. 5 (2004) 9751005. [26] R. Yager, F. Petry, A framework for linguistic relevance feedback in content-based image retrieval using fuzzy logic, Inform. Sci. 173 (4) (2005) 337352. [27] H. Ye, G. Xu, Similarity measure learning for image retrieval using feature subspace analysis, in: Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, Xian, China, 2003, pp. 131136. [28] L. Zhang, F. Lin, B. Zhang, Support vector machine for image retrieval, in: Proceedings of the IEEE International Conference on Image Processing, Thessaloniki, Greece, 2001, pp. 721724. [29] D. Zhang, G. Lu, Study and evaluation of dierent Fourier methods for image retrieval, Image Vision Comput. 23 (1) (2005) 3349. [30] X. Zhou, T. Huang, Relevance feedback in content-based image retrieval: some recent advances, Inform. Sci. 148 (14) (2002) 129 137.

1 s2.0 S0020025506003227 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0020025506003227 Main

Uploaded by

Copyright:

Available Formats

Information Sciences 177 (2007) 18781891 www.elsevier.

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

Obtain CeFDs Feature extraction stage

Testing Generate SOM models for each class

Image retrieval stage Prior class list

Retrieve the SOM of the selected class

If the predicted class is unsatisfied

Retrieve the nearest neighbors

Fig. 2. Flowchart of the overall image retrieval algorithm.

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

 X 1 N 1 f tej2put=N N t 0  X 1 N 1 F u f tcos2put=N j sin2put=N N t 0

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

SVM Cross-validation Tests

98 Prediction Accuracy (%) 96 94 92 90 88 12 16 20 24 Number of CeFDs 28 32 99 shapes 216 shapes

Fig. 4. Prediction accuracy of dierent numbers of CeFDs in SVM cross-validation tests.

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

100.0% 80.0% Precision 60.0% 40.0% 20.0% 0.0% 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 Number of Classes

SVM (ours) SVM (Chen's 4) SVM (Chen's 7)

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

Fig. 6. SOM training results for textbox category.

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

Fig. 7. SOM training results for tool category.

Fig. 8. SOM training results for sh category.

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

For a digital shape-based image, the discrete form is described as X p q lpq x  x y  y

W.-T. Wong et al. / Information Sciences 177 (2007) 18781891

You might also like

X 1 N 1 f tej2put=N N t 0 X 1 N 1 F u f tcos2put=N j sin2put=N N t 0

For a digital shape-based image, the discrete form is described as X p q lpq x x y y