You are on page 1of 8

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

NHN DNG K T S VIT TAY BNG GII THUT MY HC


Thanh Ngh1 v Phm Nguyn Khang1
1

Khoa Cng ngh Thng tin & Truyn thng, Trng i hc Cn Th

Thng tin chung: Ngy nhn: 17/04/2013 Ngy chp nhn: 19/08/2013 Title: Handwritten digit recognition using gist descriptors and random oblique decision trees T kha: Nhn dng k t vit tay, c trng GIST, Cy ngu nhin xin phn, Phn tch bit lp tuyn tnh Keywords: Handwritten Digit Recognition, GIST Descriptor, Random Oblique Decision Trees, Linear Discriminant Analysis

ABSTRACT
Our investigation aims at constructing random oblique decision trees to recognize handwritten digits. At the pre-processing step, we propose to use the GIST descriptors to represent digit images in large number of dimensions datasets. And then we propose a multi-class version of random oblique decision trees based on the linear discriminant analysis that is suited for classifying high dimensional datasets. The experimental results on MNIST dataset show that our proposal has very high accuracy compared to state-of-the-art algorithms.

TM TT
Trong bi vit ny, chng ti trnh by gii thut my hc rng ngu nhin xin phn (rODT) cho nhn dng k t s vit tay. Chng ti xut s dng c trng ton cc (GIST) cho biu din nh k t s trong khng gian c s chiu ln. Tip theo, chng ti xut gii thut hc t ng rng xin phn ngu a lp, mi cy thnh vin s dng siu phng phn chia d liu hiu qu ti mi nt ca cy da trn phn tch bit lp tuyn tnh (LDA). Vic xy dng cy xin phn ngu nhin v th to cho gii thut c kh nng lm vic tt trn d liu c s chiu ln sinh ra t bc tin x l. Kt qu th nghim trn tp d liu thc MNIST cho thy rng gii thut rODT do chng ti xut nhn dng rt chnh xc khi so snh vi cc phng php nhn dng hin nay.

1 GII THIU Nhn dng ch s vit tay l cn thit v c ng dng rng ri trong nhiu lnh vc nh nhn dng cc ch s trn chi phiu ngn hng, m s trn b th ca dch v bu chnh, hay cc ch s trn cc biu mu ni chung. Vn nhn dng ch vit tay ni chung v nhn dng ch s vit tay ni ring l mt thch thc ln i vi cc nh nghin cu. Bi ton ln lun t ra pha trc v s phc tp ca vic nhn dng ch vit ph thuc nhiu vo phong cch vit v cch th hin ngn ng ca ngi vit. Chng ta khng th lun lun vit mt k t chnh xc theo cng mt cch. Do vy, xy dng h thng nhn dng ch vit c th nhn dng bt c k t no mt cch ng
64

tin cy trong tt c cc ng dng l iu khng d dng. H thng nhn dng thng bao gm hai bc: rt trch c trng t nh v hc t ng t cc c trng c th nhn dng k t. Hiu qu ca h thng nhn dng ph thuc vo cc phng php s dng hai giai on trn. Hu ht cc h thng hin nay (LeCun et al., 1998), (Simard et al., 2003), (Kgl & BusaFekete, 2009) u s dng cc c trng c bn t nh k t nh ng bin, cnh, dy, gi tr mc xm, haar-like, vi cc x l c th nh ly mu, dao ng cc im nh, bin i nh, thm d liu o. Sau h thng nhn dng hun luyn

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

cc m hnh hc t ng nh k lng ging (kNN), mng n-ron, my vc t h tr (SVM), boosting. H thng chng ti xut trong bi vit thc hin hai bc: s dng c trng ton cc (GIST) cho biu din nh k t s trong khng gian c s chiu ln (960 c trng, chiu cho mi nh), hun luyn rng xin phn ngu a lp da trn phn tch bit lp tuyn tnh (LDA), nhn dng hiu qu cc k t s. Kt qu th nghim trn tp d liu thc MNIST (LeCun & Cortes, 1989) cho thy phng php chng ti xut, hun luyn, nhn dng nhanh v chnh xc khi so snh vi cc phng php hin c. Phn tip theo ca bi vit c trnh by nh sau: phn 2 trnh by ngn gn v trch c trng GIST t nh, phn 3 trnh by gii thut ODT ca chng ti xut. Phn 4 trnh by cc kt qu thc nghim tip theo sau l kt lun v hng pht trin. 2 RT TRCH C TRNG Trong h thng nhn dng, bc rt trch c trng l rt quan trng, nh hng ln n hiu qu ca vic hun luyn m hnh hc t ng. Cc c trng rt trch t nh phi t c mc ch quan trng l da trn cc c trng m gii thut hc c th phn bit tt nht mt k t s ny vi mt k t s khc. Cc nghin cu tin phong trong lnh vc nhn dng (LeCun et al., 1998), (Simard et al., 2003), (Kgl & Busa-Fekete, 2009) u s dng cc c trng c bn t mc thp nht l gi tr mc xm ca tng im nh, ng bin, cnh, dy, t chc haar-like, n cc phng php x l c bit khc nh ly mu, dao ng cc im nh, bin i nh. Trong my nm tr li y, cng ng nghin cu v th gic my tnh v tm kim nh c bit quan tm n hai kiu c trng rt hiu qu l c trng cc b khng i (SIFT) ca (Lowe, 2004) v c trng ton cc GIST ca (Oliva & Torralba, 2001). Cc vc t m t SIFT rt trch t nh c tnh cht quan trng l: khng b thay i trc nhng bin i t l, tnh tin, php quay, khng b thay i mt phn i vi php bin i hnh hc affine (thay i gc nhn) v kh nng chu ng vi nhng thay i v sng, s che khut hay nhiu. Tuy nhin bt bin vi php quay ca c trng SIFT li gy ra s
65

bt li cho nhn dng k t s (s 9 v 6 c th nh nhau). Hn na, phng php SIFT cung cp rt ngho nn v s lng cc c trng t nh k t s (t hn 10 c trng). Trong khi s dng c trng ton cc GIST th khng gp kh khn nh SIFT. Chnh v l do , chng ti s dng c trng ton cc GIST gii quyt vn nhn dng s vit tay. Phng php GIST rt trch t nh tp hp cc c trng quan trng nh tnh t nhin, m rng, nhm, chc chn, cho php trnh by cu trc khng gian ca mt cnh. tnh ton c trng m t GIST, nh u vo c a v dng vung, chia thnh li 4 x 4, cc t chc theo hng tng ng c trch ra. Nguyn l trch c trng da vo php bin i Gabor theo cc hng v tn s khc nhau. Mi nh k t s c rt trch cc c trng GIST (vc t c 960 chiu). Sau bc trch c trng ny, tp d liu nh a v dng bng hay ma trn m mi nh l mt dng c 960 ct (chiu), mi k t s c gn nhn (lp tng ng l 0, 1, , 9). 3 RNG NGU NHIN XIN PHN CHO PHN LP A LP Bc tin x l, rt trch c trng nh k t s to ra tp d liu c s chiu ln. Gii thut phn lp c chn tip theo sau phi c kh nng x l tt d liu c s chiu ln. Mt nghin cu trc y trong (Do et al., 2009), chng ti ngh gii thut rng ngu nhin xin phn RF-ODT cho phn lp hiu qu d liu c s chiu ln. y l s m rng t RF-CART c ngh bi (Breiman, 2001). Hiu qu ca mt gii thut hc nh nghin cu ca (Breiman, 1996, 2001) da trn c s ca 2 thnh phn li l bias v variance m , thnh phn li bias l li ca m hnh hc so vi Bayes v variance l li do tnh bin thin ca m hnh so vi tnh ngu nhin ca cc mu d liu. Trong nghin cu kt hp nhiu m hnh phn loi thnh tp hp cc m hnh phn loi cho tnh chnh xc cao hn so vi ch mt m hnh n. Gii thut RF-CART ca Breiman xy dng mt tp hp cc cy quyt nh hiu qu cao v a dng (c s tng quan thp gia cc cy thnh vin). gi c bias thp, RF-CART xy dng cc cy n su ti a khng cn ct nhnh. gi tnh tng quan gia cc cy mc thp, RF-CART s dng vic ly mu c hon li (bootstrap) t tp d liu ban u xy dng cy

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

thnh vin v chn ngu nhin mt tp con cc thuc tnh tnh phn hoch tt nht cc nt trong ca cy. RF-CART cho chnh xc cao so vi cc gii thut phn lp tt nht hin nay bao gm Boosting (Freund & Schapire, 1995), SVM (Vapnik, 1995). Hn na, n hc nhanh, chu ng nhiu tt. Tuy nhin, vic xy dng cy ca RF-CART ch chn mt chiu phn hoch d liu ti cc nt nh ngh trc y (Breiman et al., 1984), (Quinlan, 1993). Do , chnh xc ca m hnh

cy b gim khi lm vic vi cc tp d liu c s chiu ln v ph thuc ln nhau. V d nh trong Hnh 1, bt k vic phn hoch n thuc tnh no (song song vi mt trc ta ) u khng th tch d liu mt ln duy nht thnh hai lp mt cch hon ton m phi thc hin nhiu ln phn hoch, nhng vic phn hoch a chiu (xin phn, kt hp hai thuc tnh) c th thc hin mt cch hon ho vi duy nht mt ln. V th, vic phn hoch n thuc tnh c dng xy dng cy thng thng th khng hiu qu trong trng hp ny.

Hnh 1: Phn hoch n thuc tnh (tri), phn hoch a thuc tnh (phi)

khc phc nhc im trn, nhiu gii thut xy dng cy quyt nh s dng phn hoch a thuc tnh (xin phn) ti cc nt c ngh. (Murthy et al., 1993) a ra gii thut OC1, mt h thng dng xy dng cc cy quyt nh xin trong dng thut ton leo i tm mt phn hoch xin tt di dng mt siu phng. Vn xy dng cy quyt nh xin ti u c bit nh l mt vn c phc tp NP-hard. RF-ODT ca (Do et al., 2009) xy dng cc cy xin phn ngu nhin da trn siu phng ti u (phn hoch hiu qu cao, kh nng chu ng nhiu tt) thu c t hun luyn SVM. Tuy nhin, vic tm siu phng ti u ca SVM mc d hiu qu nhng c phc tp cao. gim phc tp ca ci t, chng ti xut thay th SVM bi phn tch bit lp tuyn tnh LDA, v m rng cho vn phn lp a lp (ln hn 2). Xt vn phn lp nh phn (2 lp), tng chnh ca LDA (Fisher, 1936) l tm siu phng sao cho khi chiu d liu ln th bit
66

lp gia trung bnh d liu ca 2 lp l ln nht v chng lp gia 2 lp l nh nht.

Hnh 2: Minh ho vc t (w) dng chiu d liu 2 chiu

Mt cch ngn gn, xt mt v d phn lp nh phn tuyn tnh (hnh trn, vung) nh trong Hnh 2, vi m im d liu xi (i=1,m) trong khng gian n chiu. Tp d liu phn lm 2 lp R1 (c N1

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71


1 Sw S B w w

phn t), v R2 (c N2 phn t). tm vc t chiu ti u (w) ta cn tnh nh sau: Trung bnh (trng tm) mi lp:

(9)

m1

1 xi , N1 xi R1

m2

1 N2

x i R 2

(1)

Chiu trng tm ca 2 lp m1, m2 ln vc t w:

~ 1 m 1 N1 ~ 1 m 2 N2

xi R1

w
xi R2

xi wT m1 xi wT m2
(2)

LDA tm siu phng w ti u phn hoch xin phn bng cch thc hin cc ng thc tuyn tnh trn. Ch rng, siu phng c cy quyt nh xin phn thc hin rt nhiu ln cho n nt l ch khng phi ch thc hin duy nht mt ln phn hoch. Chnh v l do , khi bit lp tuyn tnh ca d liu khng da vo hai trng tm m1 v m2 (trng hp d liu phi tuyn), cy xin phn vn x l c tnh hung ny. T khi bt u trnh by gii thut n gi, chng ti ch tp trung vo vn phn lp nh phn (2 lp). m rng gii thut cho vn phn lp a lp (c hn 2 lp). Vn chnh l chng ta phi a bi ton v dng 2 lp c th thc hin li LDA nh m t bn trn. lm c iu ny, chng ti xut m hnh phn cp. Gi s ti mt nt ca cy xin phn, chng ta c c lp (c > 2). Chng ti xut to ra 2 lp (lp dng v lp m), m mi lp c cha d liu t cc lp khc. Tc l nhng d liu ca cc lp gn nhau c gom vo mt trong 2 lp dng, m. n y, d liu ti mt nt tr li bi ton phn lp nh phn, chng ta c th p dng cng thc LDA trn. Qu trnh c th tip tc n khi phn hoch hon ton d liu. Gii thut rng ngu nhin xin phn (rODT) ca chng ti xut cho mt vn phn lp tp d liu m im d liu xi (i=1,m) trong khng gian n chiu, c thc hin nh m t trong Hnh 3. Mt cy quyt nh xin (k hiu l ODTk) trong rng ngu nhin gm k cy c xy dng nh sau: Tp d liu hc Bk l m phn t d liu c ly mu c hon li t tp d liu ban u. Ti mi nt ca cy, chn ngu nhin n chiu (n<n) v tnh ton phn hoch xin phn (s dng LDA nh m t trn) da trn n chiu ny. Cy c xy dng n su ti a khng ct nhnh. Rng ngu nhin xin rODT phn lp phn t x da vo bnh chn s ng t cc phn lp thu c ca cc cy thnh vin.

Khong cch gia m1 v m2 sau khi chiu ln w ( bit lp tuyn tnh):

~ m ~ || wT (m m ) | |m 2 1 2 1

(3)

Mt phn b (scatter) ca d liu 2 lp sau khi chiu:

~ s12
2 s2

yi:xiR 1

~ ) (w x w m ) (y m
2 T T i 1 xiR 1 i 1 2 T T i 2 i 2

wT S1w

(4)

yi:xiR2

~ ) (w x w m ) (y m
xiR2

wT S2w Vi

S1, S2 l:

S1 (xi m1)(xi m1)T S2 (xi m2 )(xi m2 )T


xiR2 xiR1

(5)

Vi SW l ma trn tn x bn trong mi lp v SB l ma trn tn x gia 2 lp:

S w S1 S 2 S B (m2 m1 )(m2 m1 )T

(6) (7)

Th t s gia bit lp tuyn tnh v tng mt phn b:

~ m ~ ) 2 wT S w (m f ( w) ~22 ~12 T B s1 s2 w Sww

(8)

Mc tiu ca LDA l tm w sao cho cc i ho f(w), a n vic gii bi ton tm gi tr ring suy rng trong (9):

67

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

Hnh 3: Gii thut rng ngu nhin xin phn (rODT)

4 KT QU THC NGHIM Trong phn thc nghim, chng ti s dng tp d liu MNIST (LeCun & Cortes, 1989) cung cp, thng dng nh gi hiu qu ca gii thut nhn dng k t s vit tay. Tp d liu MNIST c ngun gc t tp NIST do t chc National Institute of Standards and Technology (NIST) cung cp, sau c LeCun cp nht v chia thnh 2 tp ring bit : Tp hc (hun luyn) gm c 60.000 nh kch thc 28 x 28, ca ch s vit tay c dng vic hun luyn m hnh my hc t ng. Tt c cc nh trong tp hc u c canh chnh v bin i thnh d liu dng im gm 60.000 phn t (k t s) c 784 chiu l gi tr mc xm ca cc im, 10 lp (t 0 n 9). Tp kim tra gm c 10.000 nh ca k s vit tay c dng cho vic kim th, tng t cc nh trong tp th cng c bin i v canh

chnh thnh d liu im gm 10000 phn t trong 784 chiu, 10 lp (t 0 n 9). c th nh gi hiu qu ca phng php xut (rODT, GIST), chng ti s dng chng trnh ca (Douze et al., 2009) rt trch c trng v ng thi chng ti cng ci t gii thut rODT bng ngn ng lp trnh C/C++. Chng ti thc hin so snh hiu qu ca (rODT v GIST) vi cc gii thut hin nay nh gii thut AdaBoost.M1 (Freund & Schapire, 1995), (Witten & Frank, 05), LibSVM (Chang & Lin, 2001), (Vapnik, 1995), mng nron tch chp CNN (Simard et al., 2003), (ONeill, 2006). Tt c cc kt qu u c thc hin trn mt my tnh c nhn (Intel 3GHz, 2GB RAM) chy h iu hnh Linux. Kt qu thu c chnh xc nh trnh by trong Bng 1. Cc kt qu tham kho t cc phng php ca (LeCun et al., 1998), (Kgl & Busa-Fekete, 2009) cng c trnh by trong bng.

68

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

Bng 1: Kt qu nhn dng tp k t vit tay MNIST STT 1 2 3 4 5 6 7 8 9 10 11 Phng php 1-layer Neural nets (LeCun et al., 1998) Nearest-neighbor (Euclidean L2) (LeCun et al., 1998) Convolution net LeNet-1 (LeCun et al., 1998) Convolution net LeNet-4* (LeCun et al., 1998) Convolution net LeNet-5* (LeCun et al., 1998) Convolution Neural Net (CNN)* (Simard et al., 03) LIBSVM (RBF, = 0, 05, c = 105) LIBSVM (Poly, deg = 5, c = 105) AdaBoost.M1 (100 trees with C4.5) Products of boosted stumps (haar)* (Kgl & Busa-Fekete, 2009) rODT (100 oblique decision trees, GIST) chnh xc (%) 88.00 95.00 98.30 98.90 99.15 99.10 98.37 96.65 95.95 99.12 99.12

Hnh 4: Mu k t s ca MNIST

69

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

Kt qu thc nghim cho thy rng rODT s dng GIST cho kt qu nhn dng chnh xc n 99.12%, l mt trong ba phng php nhn dng tt nht khi so snh vi tt c cc phng php khc. i vo chi tit kt qu thu c ca tng phng php, nhng phng php c nh du (*) cho bit tc gi s dng cc x l c th thu c kt qu nhn dng tt. Thng cc x l ny kh phc tp nh trch c trng haar-like, lm bin dng, thay i d liu v c ci t gii thut rt phc tp nh mng tch chp CNN, tch cc boosting. Chng hn nh mng tch chp

CNN ca (Simard et al., 2003), (ONeill, 2006) v cc gii thut AdaBoost.M1 (Freund & Schapire, 1995), (Witten & Frank, 2005), thi gian hun luyn mng gn 1 ngy t c chnh xc nh trong bng. Trong khi cc gii thut SVM khng cn bt k mt x l c bit no, ch mt thi gian hun luyn 30 pht (nhanh hn 50 ln), cho chnh xc gn tng ng (thp hn 0.5%). Gii thut rODT cn thi gian hun luyn khong 15 pht (nhanh hn 100 ln so vi mng tch chp) vn cho chnh xc trong top 3. Quan trng l rODT rt nhanh, n gin, d ci t v tch hp vo chng trnh.

Hnh 5: Chng trnh nhn dng k t s (rODT, GIST)

5 KT LUN V HNG PHT TRIN Chng ti va trnh by gii thut my hc rng ngu nhin xin phn (rODT) s dng cc c trng ton cc (GIST), cho php nhn dng
70

chnh xc k t s vit tay. Bc tin x l trch c trng ton cc t nh k t s cho ra bng d liu c s chiu ln. Chng ti xut gii thut hc t ng rng xin phn ngu a lp, mi cy thnh vin s dng siu phng phn chia d liu

Tp ch Khoa hc Trng i hc Cn Th

Phn A: Khoa hc T nhin, Cng ngh v Mi trng: 27 (2013): 64-71

hiu qu ti mi nt ca cy da trn phn tch bit lp tuyn tnh (LDA). Kt qu th nghim trn tp d liu thc MNIST cho thy rng gii thut rODT do chng ti xut nhn dng rt chnh xc khi so snh vi cc phng php nhn dng hin nay. Phng php xut t hiu qu nhn dng chnh xc cao nhng khng cn bt c x l c bit no. Cc th nghim cho nhn dng k t vit tay gm k t s v 26 k t alphabet cho thy phng php ca chng ti tht s tt. Trong tng lai gn, chng ti kt hp h thng ny vi cc phng php khc cho php trch, c s xe. Hng tip cn c th p dng vo cc vn tng t trong lnh vc nhn dng, phn lp, tm kim nh. TI LIU THAM KHO
1. L. Breiman, J.H. Friedman, R.A. Olshen and C. Stone. Classification and Regression Trees. Wadsworth International, 1984. L. Breiman. Bagging predictors. Machine Learning 24(2):123140, 1996. L. Breiman. Random forests. Machine Learning 45(1):532, 2001. C.C. Chang and C.J. Lin. Libsvm a library for support vector machines. 2001. http://www.csie.ntu.edu.tw/cjlin/libsvm. T.N. Do, S. Lallich, N.K. Pham and P. Lenca. Classifying very-high-dimensional data with random forests of oblique decision trees. in Advances in Knowledge Discovery and Management Vol. 292, Springer-Verlag, 2009, pp. 39-55. M. Douze, M., H. Jgou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of GIST descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2009, pp. 18.

7.

8.

9.

10. 11.

12.

2. 3. 4.

13.

14. 15.

5.

16. 17.

6.

Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory, 1995, pp. 2337. B. Kgl and R. Busa-Fekete. Boosting products of base classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 497504. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 1998, pp. 22782324. LeCun, Y. and C. Cortes. The MNIST database of handwritten digits. 1989. D. Lowe. Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 2004, pp. 91110. S. Murthy, S. Kasif, S. Salzberg and R. Beigel. Oc1: Randomized induction of oblique decision trees. In Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993, pp. 322327. A. Oliva and A. Torralba. Modeling the shape of the scene : A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 145175, 2001. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Y. Simard, D. Steinkraus, J. Platt. Best Pratices for Convolutional Neural Network Applied to Visual Document Analysis. in Intl Conference on Document Analysis and Recogntion, 2003, pp. 958-962. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995. H. Witten and E. Frank. Data Mining : Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.

71

You might also like