MART93B

“Neural-Gas” Network for Vector Quantization and its Application to Time-Series Prediction ‘Thomas M. Martinetz, Member, IEEE, Stanislav G. Berkovich, and Klaus J. Schulten rror—nhich, in general, has many local tian Intl 4 neural network algorithm based on a “soft ule fs presented that exhibits good performance in reaching the ‘optimum, or atleast coming clas. The voft-max rule employed i an extension of the standard -means chstering procedure and {aes into account a “neighborhood raking” af the reference (eight) vectors. Its show that the dynamics of the reference (eight) vectors during the inpl-driven adaptation procedure 1) is determined by the gradient of an energy function whose shape ‘an be modulated through a neighborhood determining pa ter, and 2) resembles the dynamics of Brownian particles moving {nm potetal determined by the data point density. The network {s employed to represent the attractor of the Mackey-Class ict the outpet values ‘The results obtained forthe time-series prediction compare very favorably with the results achieved by Back-propagatton and rial basis function networks, 1 werooucrion ETIACIAL 4 Wels igi nfrmaton processing Aisrrcne th oles aon anne et noni of is fen regu the apna ing tee nies fer coms, ns we age a pln Inoingopech and inupe ein, hes epee celts elon" gunn” eqns (oe + fw me 1D ‘Eso auteton iu ecoe saa manos, « simnill 8° lang ony ent (ese y) of rere oe! eco etc cluster centers) w € Ri IN. A data vector» € V i dcr y te esachig a “wing rere ‘eto te of w or ich ie asoron rendre sth gue oro e-mfseiml Thsporebos hie mand Vint heer a sabes Vix (ee Ville—wil 0, one cannot Specify cos function that is minimized by (6. 1, THE "NEURAL-GAS” ALGORITIM In this paper, we present a neural network model which, applied tothe task of vector quantization, 1) converges quickly {0 low distortion errors, 2) reaches a distortion error E lower than that resulting from K-means clustering, msximum- entropy clustering (for practically feasible numbers of iteration steps) ad from Kohonen’s feature map, and 3) a the same time obeys a gradient descent on an energy surface (ike the maximum-entopy clustering, io contrast 10 Kohonen’s feature map algorithm). For reasons we wil give later, we call this network model the "neura-gas” network. Similar {0 the maximum-entropy chstering and Kobonen’s feature tmap the neural-ges network alo Uses a "softmax” adaptation mule. However, instead of the distance |jv — w|| or of the frangement ofthe w;’s within an exteroal late, it wtizes 2 "ncighborbood-ankng” of the reference vectors w, forthe five data veaor ach time dala vector v & presented, we determine the “neighborhood-ranking” (wi.,Wi,,---,Wiy_,) of the reference vectors witht, being closet to tn, being second lowest 10 and wi = 0,.--,NN ~ 1 being the Teference ‘rector for which there are F vectors w wit [lv — wl] < flv = 1, | I we dente the number asociated with cach ‘vector w; by k(e,), which depends on w andthe woe set (ay,-"-,w) of reference vector, then the adaption Sep we employ for adjusting the ws given by bw hglhi(o.w)) (vw) T= 1, o ‘The step size € € [0,1] describes the overall extent of the modification, and /(ki(v,w)) i unity for kj = 0 and ‘decays to 200 for increasing , with a characteristic decay ‘constant A. In the simulations we describe as follows, we cehose ha (Fi(oye)) =e K\PAD)/9, For X — 0, (7) becomes equivalent to the K-means adaptation rule (3), whereas for 2-0 not only the “winner” w,, but the second closest reference vector w,, third closest reference vector wi, et, is also updated, [As we show in Appendix I, the dynamics ofthe wy’s obeys a stochastic gradient descent on the cost function Eqg(t,)) = mab. | dy P(ayha(bi(o,w))(9 ~ w,)? ® wit ca) = Saath) = Fae) 25x normalization factor that only depends on A. Byy i elated to the framework of fuzzy clustering [24], [25] In contrast to hard clustering where each data point v is deterministcally assigned tits closest reference vector wp, fuzzy clustering associates 9 o a reference vecor w witha certin degree (a), the so-called “fuzzy membership of w to cluster i. In the case of hard clustering holds piy(v) = 1 and pa(v) = 0 for ## to, If we choose a “fuzzy” assignment of data point 1 to reference vector w;, which depends on whether 1; is the nearest, next-neares, next-next-nearest, et, neighbor of v, ie, if we choose pi(o) = ha(hi(e,e))/C(A), then the average distortion error we obtain, and which has to be sinimized, is given by Eng, and the corresponding gradient escent is given by adaptation rule (7).‘Through the decay constant A we can modulate the shape Of the cost function Eyy. For A —+ oo the cost function Eng becomes parabolic, whereas for 1 —+ 0 it becomes {equivalent tothe eos function in (2), ie the cost function We ultimately want to minimize, but which has many local ‘minima. Therefore, to obtain good results concerning the set of reference vectors, we start the adaption process determined by (7) with a large decay constant and decrease A with each adaptation step. By gradually decreasing the parameter \ wwe expect the local minima of F to emerge slowly, thereby Preventing the set w of reference vectors fom getting trapped in suboptimal states. IL THE NETWORK’S PERFORMANCE ON A MODEL PROBLEM ‘To test the performance of the neural-gas algorithm in mini- ‘mizing E and to compare it with the three other approaches we described (-means clustering, maximam-entropy clustering, and Kohonen’s topology-conserving map), we choose a data sistrbution Pw) for which 1) the global minimum of Es known for large numbers of reference vectors and 2) which reflects, atleast schematically, essential features of data dsti- butions that are typical in applications. Data distributions that ions often consist of, eventually separated, points. Therefore, also for our test we choose 4 model data distribution that is clustered. To be able 10 Setermine the global minimum, in our mode data distribution the clusters are of square shape within » two-dimensional ut space. Since we choose N= Axnumber of clusters and separate the clusters far enough from each other, the ‘optimal set of u's is given when each of the square clusters is represented by four reference vectors, and when the four reference vectors within each cluster are arranged inthe known ‘optimal configuration for a single squat, In Fig. 1 we see the neural-gas adapting into a representation ‘of our model data distribution with 15 clusters and N’~ 60 reference vector. With each adaptation step, a data point Within one of the squares is stochastically chosen with equal probability over each square. Subsequently, adjustments ofthe 10's according to (7) are performed, We show the intial state, the state after $000, 15000, and finally after 80000 adaptation steps. In the simulation run depicted in Fig. 1 the neutal-gas algorithm was able to find the optimal representation of the data. distribution, However, depending on the initial choice of the wy's(cho- ‘sen randomly) and depending on the speed with which the parameter A is decreased, i., depending on the total number Of adaptation steps tax employed, it might happen thatthe refetence vectors converge toa configuration that is ony close ‘but not exactly at the optimum, Therefore, to demonstrate the average performance of the neural-gas algorithm we showin Fig. 2 the mean distortion error fr different total numbers of adaptation steps fnax- For each of the diferent total numbers ‘of adaptation steps We averaged over 50 simulation runs, for each of which not only the initialization of the w,"s were chosen randomly but also the 15 clusters of our mode! aia distribution were placed randomly. Since we know the ‘minimal distonion exter Fy that can be optimally achieved ooo 7 g ooo ago eo og Fg 1. The nerabgs netionk representing data dso a 22 hat ‘is of 1 prt cates of are shape, On each cast the decay ‘a ms homepne Te cece ves Wy ae Speed Dns Te nil ales forthe ws a hen ano! whe awe {nthe tp le pcre. We sesh th sate ae £40 (op gh. 18000 (toe Tet) nd ater 8000 sdepetion wep fm gh) A he end fhe adaption pace the set of recs esas as conte the opal congaton ey each che epee by four eres for our model data distribution and the number of reference vectors we employ, we choose « = (Eta) ~ Ea)/Eo a8 8 performance measure with E(¢max) as the final distortion cerzor reached. cr = 0 corresponds to a simulation run which reached the global minimum, whereas, c.g, = 1 corresponds ‘0 a very large distortion error, ie, distortion error which fs twice as large as the optimum, As we can see in Fig. 2 for tax = 100000 the average performance of the neural gas network is a = 0.09, which means that the average distortion err B for tmax = 100000 is 9% larger than what can be optimally achieved For comparison, we also show in Fig, 2 the result achieved by the K-means clustering, the maximum-entropy clustering, and Kohonen’s feature map algorithm. Up to tmac = 8000, only the distortion error ofthe K-means clustering is slightly smaller than the distortion error of the neural-gas algorithm. For tax > 8000, all three procedures perform worse than the neural-gas algorithm. Fora total number of 100000 adaptation steps the distortion error of the maximum-entropy clastering is more than twice as large as the distortion ertor achieved by the neural-gas algorithm. Theoretically, for the maximum- entropy approach the performance measure a should converge 'o 2er0 for fax — 90. However, a8 mentioned already in the introduction, the convergence might be extremely slow. Indeed, al four clustering procedures, including the maximum. entropy approach and the neural-gas algorithm, donot improve their final distortion eror significantly further within the rangePerformance (E-Es)/Ep oad Ton ‘Tova aumbe of adaptation Hep un Fig. 2. The prtormce of he aus citi a minimizing the Sionon enor E fo the mol darn of dat pits which dese in he tex and an staple of which x onwn i Pp Deitel the fn for ifsc Stal manera adaption ep tee The performance mesa (BE —"ra)/e wih Pea he minal dono eer tht ca be sclved for be pea ea debt nde sume feces vets we ed et the wat Pe compara ne sn where Siedith the Mada K-reae stern makina cetiopy cui aad Kobe's enue sp tise Upto tnar = 5000 oly be Satorton er ofthe Kms suserig i gly sal hn he Saran rt ofthe meal alge, For Yan > BOD th de ter peaches peroom none than he nea model For al mene 0000 aeons he ton othe ner lg esi by toe tha Yosef wo than te Sitrton cor sheved y the maimar opy reesue 100000 < tax < 500 000. Which the txts were made. Fig. 2 demonstates that the convergence of the neutal- 25 gorithm is faster than the convergence of the three other approaches, This i important for practical applications ‘where adaptation steps are “expensve," eg, in robot contol ‘where each adaptation step comesponds 10 tial movement (of the robot arm. Applications that require the lering of Inpu-outpt relations, vector quantization networks establish 4 representation of the input space that can then Be used for generating output values, ether through discrete output values [26], local linear mappings 12}, of through radial basis functions [27 Kohnen’ toplogy-conserving map as 4 vector quantizer, together with Toca linear mappings for generating outpt values, bas been studied for 8 umber of robot contol tasks (12), [16}-[18}. However, forthe reason of faster convergence, we took the neural-gas algorithm for an implementation of the Jaming algorithms [16)-{18} on an indusirial robot arm [28]. Compared to the versions that Fe based on Kohonen's feature map apd require about 6000 dapation steps (vial movements ofthe robot arm) oreach the ‘minimal positioning evor (18) only 3000 eps ae sicient ‘when the neural-gas network is employed (28) For the simulations of the neurl-gas network a presented in Fig. 2 we chose ha(k) = exp (—k/2) with 9 decreas exponentially with the numberof adapation steps ¢ aX) NOsPaMios with Ry = 10,Ap = O01, and tae € (6, 10000]. Compared to other choles forthe neighborhood function ha(k), eg, Gaussians, hy(E) = exp( 8/2) provided the best result The step size « has the same time Aependence a8 Ate, «8) = G(ez/a)! toe with 500000 isthe limit up to and ty = 0.005. The similarity of the neura-gas network and the Kohonen algorithm motivated the time dependence (0) = 2i(rg/)""* fore and . This ime dependence has provided good results in applications ofthe Kohoncn network [16}-{18}. The parcutar choice for Ai,y.¢r, and e7 isnot ‘very eiial and was optimized hy “il and eto.” The only simulation parameter of the the adaptive K-means clustering is the step size «, which was chosen in our simulations identical to that ofthe neura-gas algorithm, In contrast the three other vector quantization procedures, the final result ofthe K-means clustering depends very much on the quality of the nial Aistribution of the reference vectors w. Therefore, to avoid a comparison in favor ofthe neural-as network, We initialized the -K-means algorithm more presructured by exploiting a priori Knowledge about the data distribution. Rather than initializing the wi's totally at random, they were randomly assigned to vectors lying within the 15 clusters, This choice prevents that some of the codebook vectors remain unused For the Monte Carlo simulations of the maximum-entopy clustering the step size « was also chosen a forthe neural-gas algorithm. The inverse temperature 8 had the time dependence Blt) = fulBp/B,) "with B= 1 and By = 10000. This scheduling of provided the best results for the range of ‘otal numbers of adaption steps tmx that was investigated ‘Also for Kohonen's feature map algorithm the sep size ¢ ‘was chosen 35 forthe neual-gasalgoitm and the oer two clustering procedures. The function hi (4) that determines the neighborhood relation between site and site j of the latice A of Kohonen's feature map algorithm was chosen to be Gaussian of the form ho(i,j) = exp (-Ili— jIP/20*) 19}, LOHLI8}. The decay constant 0, like A, decreased withsteps £ according to a(t) = aiay/ai)'™ 2 and oy = 0.01, The values for 6, and 0 were optimized. IV, “GASUIRE” DYNAMICS OF TH REFERENCE VECTORS In this section we clarity the name neurl-gas and give a quantitative expression forthe density distribution of the Feference vectors, We ine the densi ou) of reer vet at loon w of VC R® through ou) = Fig) with a(w) being the fe oan onl oe od cea ee of Voronoi polygon Viiu. According to the definition of Vi, hich was given in (I), 4 & Vy fs valid, Hence, o(t) is 3 Step function that is constant on each Voronoi polygon Vn the following, we study the case where the Vorono polygons change their size F slowly from one Voronoi polygon 0 the next. Then we can regatd o(w) as being continuous, which allows to derive an expression for (u's dependence on the ensitydistibution of data points Fora given v, the density distribution o() determines the smumbers (vt). = 1y---, Ny which are necessary for an agjutment of the reference vectors wf) isthe number of reference vers within sphere cenered et» with radios Ile wil te o Inthe following, we look atthe average change (Aw) of a reference vector with an adaptation step (7), given through «[ Bor ersiisow)o-w). (0) (Sw) In case of a small decay constant A, ie, a A for which the range of fia(ks(v,2)) is small compared to the curvature of P(u) and p(w), we may expand the integrand of (10) around w, since only the data points v for which lv ~ |) is a small contribute to the integral. If, asin the simulations descried previously, A decreases against zero with the number fof adaptation steps, the range of ha(Fy(v,»)) will always ‘become that small at some point during the adaptation process, ‘The expansion of the integrand together with (9) yields to the leading order in A 24DP. gra (PP Pawo). a) ‘u denotes the gradient with respect tothe coordinates ofthe data space. Equation (11) states that the average change of a reference vector wy at location w is determined by two terms, fone which is proportional to dyP(u) at w, and one which is proportional to dyo(u) at w. The derivation af (14) is provided in Appendix Il Equation (11) suggests the name neura-gas for the al- ‘gorithm introduced. The average change of the reference Yectors corresponds to an overdamped motion of particles in @ potential V(u) that is given by the negative data point density, ie, V(x) = —P(u). Superimposed on the gradient of this potential is a “force” proportional to dye, which points toward the direction of the space where the particle (aw density p(w) is low. This “force” isthe result of a repulsive coupling between the pails (reference vectors). In its form it resembles an entropic force and tends to homogeneously lstribute the particles (reference vectors) aver the input space, Tike in ease of a diffusing gus ‘The stationary solution of (11), ie the solution of is piven by ofte) ox Plu)” (2) wit D 7= Dee ww This relation describes the asymptotic density distribution of the reference vectors w, and sates that the density o(u) of reference vectors at location w is nonlinearly proportional to the density of data points P(u). An asymptotic density bution of the reference vectors that is proportional to Puy” with y = D/(D + 2) is optimal for the task of ‘minimizing the average quadratic distortion err (2) [29]. ‘We tested (13) for a one-dimensional data distribution, ie, for D = 1. For this purpose we chose a data density distribution of the form Pu) = 2u,u € [0,1] and N = 50 reference vectors. The initial values for wy € R were drawn randomly from the interval [0,1]. For the parameters © and X we chose the small but finite values © = 0.01 and = 2, which were kept constant during a subsequent perfomance of 5000000 aapiaton spe (DA doble tion of the final result, Le, of the 50 data = oft) = 2/(wiet ~ wins yi = Ayo-580, 10.323, which compares well to the theoretical \V. ON THE ComPLEXITY OF THE NEURAL-GAS NETWORK ‘The computationally expensive part of an adaptation step of the neural-ges algorithm is the determination of ‘the “neighbothood-ranking,” i.e, of the kiyd = IyseeM. In 4 parallel implementation of the neura-gas network, each reference vector w; can be assigned toa computational unit ‘To determine its, each unit i has to compare the distance || oC its reference vector tothe input w withthe distance liv — will of all the other units 3,3 = 1,---,. If each unit performs this comparison in a parallelized way, each unit é needs O(log N) time steps to determine its “neighborhood rank" kj. Ih subsequent time step, each computational unit adjusts its wy according 10 equation (7). Hence, in a parallel implementation the computation time required for an adaptation step of the noural-gas algorithm increases like log with the number NV of reference vectors ‘A scaling like logN is an interesting result since the ‘computation time for an adaptation step of « “winner-take-all” ‘network like the K-means clustering algorithm, which requires ‘much less computation because only the “winning” unit has to be determined, also scales like logN in a parallel implementation. Ina serial implementation, of course, the computationime required for an adaptation sep ofthe neural ga algorithm increases faster with than the coresponding computation for 8 step ofthe K-mcans clustering. Determining the kei = Iy-yN in seid implementation correspon 10 sorting the distances fo wilt = 1.-,.N, which scales Hike NV log N. Searching forthe smallest disiance [fo 10 perform a sep of the K-means clustering sales only linearly ‘with the numberof reference vectors. VL. AprLicaTion To ‘TME-SERIES PREDICTION A very interesting learning problem is the prediction of deterministic, but chaotic, time-series, that we want to take as an application example of the neural-gas network, The particular time series we choose isthe time-series generated by the Mackey-Glass equation [30]. The prediction task requires to learn an input-output mapping y = f(v) of a curent state 1 of the time-series (a vector of D conseeutive time-series Values) ito @ prediction of a future time-series value y. If one chooses D large enough, ic., D = 4 in the case of the Mackey-Glass equation, the D'dimensional state vectors w all lie within a limited part of the D-dimensional space and form the attractor V of the Mackey-Glass equation. In order to approximate the input-output relation y = f(w) we partition the attractor’s domain into NV smaller subregions Ya Nand complete the mapping by choosing local linear mappings to generate the output values on each subregion Vj. To achieve optimal results by this approach, the partioning of V into subregions has 10 be optimized by choosing Vs the overall size of which is as small as posible. ‘A way of pattioning V isto employ a vector quantization procedure and take the resulting Voronoi polygons as subre- isions Vj. To break up the domain region of an input-output relation by employing a vector quantization procedure and to spproximate the input-output relation by local linear mappings ‘was suggested in [12]. Based on Kohonen's feature map algorithm, this approach has been applied successfully 10 various robot control tasks [12], {16}-{18] and has also been applied to the task of predicting time series [31]. However, a5 we have shown in Section Il, for intricately structured input manifolds the Kohonen algorithm leads to suboptimal parttionings and, therefore, provides only suboptimal approx- imations of the input-output relation y = f(9). The atractor ‘of the Mackey-Glass equation forms such a manifold. is topology and fractal dimension of 2.1 for the parameters ‘chosen makes it impossible to specify a coresponding latice Structure. For this reason, we employ the neural-gas network {for paritioning the input space V, which allows to achieve {good or even optimal subregions V, also in the case of {topologically intricately structured input spaces. ‘A hybrid approximation procedure that also uses a vector ‘quantization technique for preprocessing the input domain 19 ‘obiain a convenient coding for generating output values has been suggested by Moody and Darken [27] I their approach, preprocessing the input signals, for which they used the K. ‘means clustering, serves the tsk of distributing the centers 1, of set of radial bass functions, ie, Gaussian’s, over the input domain, The approximation of the input-output relation is then achieved through superpositions ofthe Gaussians. Moody and Darken demonstrated the performance of their approach 80 for the problem of predicting the Mackey-Glass time- series. A comparison of their result with the performance ‘we achieve with the neural-gas network combined with local Tincar mappings is given below. A. Adaptive Local Linear Mappings ‘The task is to adaptively approximate the function y = f(w) with v € VC RP and y € RV denotes the function's domain region. Our network consists of NV computational units, each containing a reference or weight vector wy (for ‘the neural-gas algorithm) together witha constant yi and 2 D- ‘dimensional vector aj. The neural-gas procedure assigns each unit toa subregion V; as defined in (1), andthe coefficints ty and a define a linear mapping, B= wae) ay from RP to R over each of the Voronoi polyhedra V. Hence, the function y = f(e) is approximated by j = fe) with £00) = vit + axe) “(2 49)) as) ‘i(v) denotes the computational unit i with its w, closest to v “To leam the input-output mapping. we perform a series of training steps by presenting input-output pars (v, y= (8) The reference vectors w, are adjusted according t adaptation step (7) ofthe neurl-gs algorithm, To obtain adaption rules for the output coefficients y, and a, fr each i we reqite the mean squared error fy, €20P(w)(F(o) ~ f(W))? between the acual and the required output, averaged over subregion Vi, to ‘be minimal. A gradient descent with respect wo ys anda; yields [oP =w [eerily wa -(0=w9) 09) (@—w)) ana [oP I9— nace) om) ii dy Plu) iw) an wly— va (om) For \ + 0 in adaptation step (7) the neural-gasalgoritm ‘provides an equlibriam distrbution of the w's for which ti 42vP(o)(e-m,) = 0 foreach , Le, denotes the center of gravity ofthe subregion V;, Hence, fy, @®vP()a(0- 1.) in (16) vanishes and the adaptation step forthe y's takes ‘on @ form similar tothe adaptation step ofthe ws, except that only the ouput of the “winner” anit is adjusted with « training sep. To obtain a significantly faster convergence of the outpt weights y. anda, we replace bin (16) and (17) by hye.) hich has he same form a hy (k(n adaptation ep (7), except thatthe decay constant’ might be ofa different value than A. By this replacement we achieve thatthe y. and a, of each unit is updated in a taining step,with a step size that decreases with he unit’s “neighborhood rank” to the current input v. In the beginning of the taiaing Procedure, 4 is large and the range of input signals that affect the weights of a unit i large. As the number of training steps ‘increases, ’ decreases to zero and the fine tuning ofthe output eights tothe local shape of f(w) takes place, Hence, inthe ‘on-line formulation, the adapiaton steps we use for adjusting 1% and a, are given by Au =e habla) (v ~ w) Aas =e hae(ki(0t))-(y ~ vi ~ (v= an) (vw), (is) 'B. Prediction ofthe Mackey-Glass Tne Series ‘The time series we want 10 predict with our network algorithm is generated by the Mackey-Glass equation aa(t=r)_ Tran HO) = Balt) + with parameters «= 0.2, = -0.1, and + = 17 [30]. x(¢) is quasi-periodic and chaotic with a fractal attractor dimension 2.1 forthe parameter values we chose. The cheracterisie time constant of 2(¢) is fanae = 50, which makes it particularly Aitfical 1 forecast (+ At) with At > 50, Input v of our network algorithm consists of four pst values of a(t), ie, (x(t) x(t ~ 6), 2(¢~ 12), 2(¢ — 18). Embedding a set of time-series values in a state vector is common to several approaches, inchuding those of Moody and Darken (27), Lapedes and Farber (32), and Sugihara and May (33). The time span we want t0 forecast into the future is At = 90. For that purpose we iteratively predict -2(t+6),2(¢+ 12), et, until we have reached (+90) afer 15 of such iterations. Because of this iterative forecasting, the ‘output y which corresponds tov and whichis used for taining the network is the te valve of x(¢ +6) We studied several different traning procedures. First, we trained several networks of different sizes using 100000 0 200000 training steps and 100000 training pairs » = (x(#),2(¢ ~ 6), (¢ ~ 12),2(¢ ~ 18)),y = x(t + 6). One ‘could deem this raining as “on-line” because of the abundant supply of data. Fig. 3 presents the temporal evolution ofthe neural-gas adapting 10 a representation of the Mackey-Glass tractor, We show a three-dimensional projection ofthe four «dimensional input space. The initialization of the reference vectors, presented inthe top left pat of Fig 3, is random, After 500 training steps, the reference vectors have “contracted” coarsely 10 the relevant part of the input space (op right), With further training steps (20000, bottom lef), the network assumes the general shape ofthe atacto, and atthe end ofthe adaptation procedure (100000 training steps, bottom sight), the reference vectors are distibuted homogeneously over the Mackey-Giass attrctor. The small dots depict already. Presented inputs u, whose distribution is piven by the shape of the attractor.= 020 % 0 ar fel. te mS ow | § om oN F 2 10 eS sum i é 2 “io fr 3 as f10 en Be 1 140 200 E a0 es 5 20 = ; 130 2 an i z 20 am 10 2 28 3 as 4 40S 10235 «3 3 «450 S Tot Naber of Weighs) lg Si fDi Se) Fe 4. Th mem pooner ves thesis he eon, FS. Th maid pein re vr ing ti ries ‘he sera pape coined wth alee mapping (1) npr srl agent Moody nd Dsken's Keone RBF tod (2) and aps Sek propzato 3) To aa the oe pdision er, he Kea REE Iba eles stot 10 times mre welgh than te era sgt Fig. 4 shows the normalized prediction error as a function of network size. The size of a network is the number ofits ‘weights, with nine weights per computational unit (four for cach w, and a,, plus one for y,). The prediction error is determined by the rms value of the absolute prediction error for At = 90, divided by the standard deviation of (0), ‘As we can see in Fig. 4, compared to the results Moody ‘and Darken obtained with K-means clustering plus radial basis functions (27) the neural-gas network combined with local linear mappings requires about 10 times fewer weights to achieve the same prediction error. The horizontal line in Fig. 4 shows the prediction error obtained by Lapedes and Farber withthe back-propagation algorithm [32]. Lapedes and Farber tested only one network size. On a set of only 500 data points they achieved a normalized prediction ertor of about 0.05. However, their leaming time was on the order fof an hour on a Cray X-MP. For comparison, we tained ‘8 network using 1000 data points and obtained the same prediction error of 0.05, however, training took only 90 ‘on a Silicon Graphics IRIS, which achieves 4MFlops for LINPACK benchmarks. To achieve comparable results, Moody and Datken employed about 13000 datapoints, which required 41800 s at 90 KFLops. Hence, compared to our learning procedure, Moody and Darken’s approach requires a much larger data set but is twice as fast. However, because of possible variations in operating systems and other conditions, ‘bh speeds can be considered comparable. Fig. 5 shows the resulis of @ study of our algorithm's performance in an “off-line” or scarce data environment. We trained networks of various sizes through 200 epochs (or 200000 steps, whichever is smaller) on different sizes of traning sets. Due to the effect of overfiting, small networks achieve a better performance than large networks if the taining set of data points is small. With increasinly large amounts of data the prediction error for the different network sizes saturates and approaches its lower bound, ‘vor sae Doe athe fe af vernal etvors sce bet fesormace an ag network the taing st of ata po reall ‘Wahine Inge aman of tbe predison eer or the diferent wor ses apace one bt ‘As inthe simulations described in Section I, the parameters 6 A,€,, and X” had the time dependence x = 24(24/2,)!"== with t's the cutent and fax 88 the total sumber of taining steps. The initial and final values forthe simulation parameters were «= 0.99,¢7 = 0.001;4; = N/3,Ap = 0.0001; <4 05,¢) = 0.05;X, = N/6, and X) = 0.05. As in the simulations of Section HI, the particular choice for these parameter values is not very critical and was optimized by vial and error. Both h(t) andy (E) decreased exponentially with &. VIL. Discussion In this paper we presented a novel approach to the task of ‘minimizing the distortion eror of vector quantization coding. ‘The goal was to present an approach that does not require any prior knowledge about the set of data points and, at the same time, converges quickly 10 optimal or at least near ‘optimal distortion errors. The adaptation rule we employed is a softmax version of the K-means clustering algorithm and resembles to a certain extent both the adaptation rule of the maximum-entropy clustering. and Kohonen’s featre map algorithm. However, compared tothe maximum-entropy clustering, it is distance ranking instead of the absolute sistance of the reference vectors to the current data vector that determines the adaptation step. Compared to Kohonen's feature map algorithm, it is not the neighborhood ranking fof the reference vectors within an external latice but the neighborhood:-ranking within the input space that i taken into sccount. ‘We compared the performance of the neural-gas approach with K -means clustering, maximum-entropy clustering, and ‘Kohonen's feature map algorithm on # model data distribution Which consisted of a number of separated data clusters. On the model data distribution we found that 1) the neural-gus algorithm converges faster and 2) reaches smaller distortion errors than the thee other clustering procedures. The price for the faster convergence to smaller distortion errors, however,is a higher computational effort. In serial implementation the computation time of the neural-zas algorithm scales like Niog.N’ with the number NV of reference vectors, whereas the three other clustering procedures all scale only linearly with .V. Nonetheless, in a highly parallel implementation the computation time’ required for the neural-gas algorithm bbocomes the same as for the three other approaches, namely O(logN), ‘We shoved that, in contast 10 Kohonen's feature map algorithm, the neural-gas algorithm minimizes a global cost function. ‘The shape of the cost function depends on the neighborhood parameter A, which determines the range ofthe ‘global adaptation of the reference vectors. The frm ofthe cost function relates the neural-gas algorithm to fuzzy clustering, ‘with an assignment degree of data point to a reference vector that depends on the reference vectors neighborhood rank to this data point. Through an analysis of the average change of the reference vectors for small but finite, we ‘ould demonstrate that the dynamics ofthe neural-gas network resembles the dynamics of a set of particles diffusing in 4 potential. The potential is given by the negative density Aisiribution of the data points, which leads toa higher density of reference vectors in those regions where the data point ensity is high. A quantitative relation between the density of reference vectors and the density of data poins could be derived ‘To demonstrate the performance of the neural-gas algorithm, wwe chose the problem of predicting the chaotic time-series generated by the Mackey-Glass equation. The “neural-pas” network had 10 form an efficient representation of the un derlying attractor, which has a fractal dimension of 2.1. The representation (as discretization of the relevant pats of the Input space) was utilized to learn the required output ie a forecast of the time-series, by using lal linear mappings. A comparison with the performance of K-means clustering ‘combined with radial basis funetions showed thatthe neural= 188 network requires an order of magnitude fewer weights to achieve the same prediction error. Also the generalization capabilites ofthe neural-gas algorithm combined with local linear mappings compared favorably with the generalization capabilites of the RBF-approach. To achieve identical accu- racy, the RBF-approach requires a training data set that is larger by an order of magnitude than the taining data set ‘which is sufficient for aneural-gas network with local linear ‘mappings. ApPENDIX 1 ‘We show that the dynamics of the neural-gas network, described by adaptation step (7), caresponds 10 stochastic eraient descent on a potential function. For this purpose we prove the following: Theorem: Fora sot of reference vectors w wy). tw, € RP and a density distribution P(v) of date points WE RP over the input space V CR, the relation or = a9 [@orenthiow)te-) =~ with B=3h Py P(eyha(ky(v,w))(v~ ws) (20) is valid. &(v,w) denotes the numberof reference vectors wy with [jv ~ al] < |Iv — yl Proof: For notational convenience we define d(v) ‘This yields ey with $ [eoromesnmg tile, a 1NS(-) denotes the derivative of hy(-). We have to show that R, vanishes for each i = 1,>+-N. For kj(v,40) the relation do.) = ad ~) ey is valid, with 6(.) as the Heavyside step function 1, forz > 0 we) {0 fae co “The derivative ofthe Heavyside step function (2) isthe det distribution 5{2) with {2)=0 fore 20 [itera and This yields nef aP oP oth (b(v.)) > Se ~ A =¥ [ PoPcovsitiow) deed — a) - es ach of the 1 integrands in the second team of (24) is nonvanishing ony for thse ws for which dt i valid, respectively. For these w' we can Write y(u, wo) = od ~ df) = Dod - a) =klve), 5)and, hence, we obtain R= [ dP vP(a)hs(hy(u, w))d? dy 6d? ~ dP) ~ [/ aPoProyctstevw) dds > 66 — a 6) Since é(r) = 6(—z) is valid, Ry vanishes for each é = Ten “APPENDIX I Inthe following we provide a derivation of (6) The average change (Aw,) of a reference vector with adaptation step (7) is given by (am) =« fa Porrih(ow)ie-m) en with Ay (l(0,.)) being a factor that determines the sizeof the adaptation step and that depends on the number k; of reference vectors w, being closer to than w. We assume ha(f,) 10 be tunity for; = 0 and to decrease to zero for increasing ky with a characteristic decay constant, We can express (0,0) by ki(v,t0) = (ou)? 1 geen 0 with o(a) a8 the density distribution of the reference vectors in the input space V = R2. For # given set w = (w1,---,ww) of reference vectors, ‘x(v,.) depends only on r'= 9 —w, and, therefore, we introduce ar) Fo bale), @) Inthe following we assume the limit of « continuous dist bution of reference vectors wi, ie, we assume e(u) to be analytical and nonzero over the input space V. Then, since elu) > 0, for a fixed direction #,2(7) increases strictly ‘monotonically with |[r| and, therefore, the inverse of zr), denoted by +(2), exists, This yields for (27) « [Peat reayinta?re)sle)de 0) with J(z) = det (Ory /02.), u,v = 1,---,D and = [a We assume hy (y(~ w,)) 10 decrease rapidly to 2eo with increasing lv ~ wl, ie., we assume the range of ha(bi(r)) within V to be small enough to be able to neglect any higher derivatives of P(u) and o(u) within this range. Then we may replace P(w; + r(z)) and J(2) in (30) by the frst terms of their Taylor expansion around 2 = 0, yielding (am) =¢ f i(2?)(Poe) +an5r +) : (1) + age + rte) ae en (aw, ‘The Taylor expansion of r(x) can be obtained by determin= ing the inverse of (r) in (29) for small |||. For small |r| {it holds jm] < |}r|| in (28)), the density o(v + wu) in (28) is given by ov + u) = ow +r +n) = oui) + (r+ wdro(wi) + Ofr?) G2) with d= 0/9, Together with (29), tis yes Brel) 207) =r(roalw)/? (14 Pee ow) +002) Ca with pp ey 8s the volume ofa sphere with radius one in dimension D. As the inverse of (33) we obtain for smal jz 200)" (1 (roe) PERE 4 012") G5) which yields the fist terms of the Taylor expansion of (a) Mound 2 = 0. Bechie of (5) i ols oy 2rglz) ae, bur00)-¥? = poy PELRE _ Gan py-1/0 Eu 2 (1-00)? Ee — (roe)? 3 3 = (= by 3(r00)"7/° 56 38 4 Ola?) 0) ad, therfore, OP _ oP or Os = Br ds = (700) /PapP oy is valid at = = 0 ‘The off-diagonal terms in (36) contribute only in quadratic order 10 Jz) = det (Ory/0xy). Consiering the diagonal terms op wo Howar onde a = gil S ore tye, Jee) = (ro0)"*(1=(rv0-¥°(14 +012") one an, therefore, A J D) Fen oer (145) 2%0 0% is valid at 2 = 0. ‘Taking into account (35), (37)-(39) we obtain, for (31), (aw) = Hiroe? fame? (Pt) + (roo). P+) (coor ~ (1+ 5) roey-noevont =e 4.) (0 coo Jie “‘The integral over the terms of odd onder in z vanish because of the rotational symmetry of hx(r?). Considering only the leading nonvanishing term (leading in A), we finally obtain (aw, ro (*®*P=7"2are) «ay with «2 and D way “ Actsowuuncwent The authors would like to thank J. Walter for many discus- sions on the problem of predicting chaotic time series and for providing parts of the prediction software. They also thank R, Der and T. Villmann for valuable remarks on the theorem in “Appendis 1 REFERENCES (0) RM. Gi, "ese quotation” HEE ASSP Mag, va, 0.2.38 ap, Soe [2] SP Lye Lea agentes quantization a POM,” IEE Trans. orm Theory wo. 138, p21 [a] "Macon, "Soe method fr clisiston and anys of mut thaite eteatons."in Prac St Berkey Spo Mathomas Sets and Probably, LM. LeCun and 1. Nea Ede 1907, 14) Rowe, Guten, ant 6. Foe, “Ss michancs apse ‘tmsios in casein” Physea! Rey Lets ol 6, a. pp StS oe, {5} $: Geman al. Geman, “Sockati elaine, Gite dtrbion And the Bayean enor of images” TEEE Troms Pee tal Heo a a a. 7 (6) $°7-Nowla, "Marina tneeod epee ing." in Advances sel rt Posing Ss 2B iy, 2 No Yor: Morgan Katina, 190, pp. 310-8 171 T Ket Sema etna gly cre fee snaps Biologia Cybern op. 39-0, 1. Us) T Rotonen Asai of Sine se-ransng pos” Bile iter ol pp 15-140 ks (91 1 Kone, Se Srantaton nd oie Memory (Spins Seis | Inormtion Sciences 8)” Hoth. Seinge luo] T Koboven Mis, and Sarma Phonic map insight ‘epee of poco exes peck reapion ih Pr [ihn Conf on Patern Recogiion Motel RS 2-186, (ut) Je Mok" Ronese, aH Gh "cr guueie ie specch faning” Pro IEEE, v7, pp. S886 588 (02] Hote and Shut, “Tops esnering mppigs for easing ‘oo ek” a Neral ork or Camp. HS Dene Ea AIP Conf Pro I3i, Seog, Ut 1986p 16-90. 03] NIM Namba and RO King “nage Selig ing vector gua Zale: A revi," TEEE Trane Commu vo opp 935 I TM, Nasal and. Feng, “Nestor quniztion of images based ‘pon the Kone sel raising eat map IEEE Int Conf Nea! Neoors San Deg, Ca, 988 pp 101108 (05) 2-Roylorand K.PLD "Arai of = Sarl eciwenh gochey for stor uct of Speech pam” Is roc Ta Rm INNS Meet, 136, p 0. bs) U6) H. Ries. Maines, and. Sebuhen, “Topslgy-
You might also like
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (821)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4609)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1103)
Magazines
Podcasts
Sheet music
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (119)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (265)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2322)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (98)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1929)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (231)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (399)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (2099)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (789)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (1937)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1891)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (104)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1839)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (890)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4200)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (3811)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (738)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1015)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (137)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1711)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (234)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (440)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (792)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2409)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carre
3.5/5 (104)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (271)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (537)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (73)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (5794)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (587)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2219)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1090)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (599)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (344)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (838)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (474)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (806)
Ji Zhu - Flexible Statistical Modeling
Document98 pages
Ji Zhu - Flexible Statistical Modeling
Tuhma
No ratings yet
Václav Hlavác - Linear Classifiers, A Perceptron Family
Document22 pages
Václav Hlavác - Linear Classifiers, A Perceptron Family
Tuhma
No ratings yet
T. Villmann Et Al - Fuzzy Labeled Neural Gas For Fuzzy Classification
Document8 pages
T. Villmann Et Al - Fuzzy Labeled Neural Gas For Fuzzy Classification
Tuhma
No ratings yet
Jan Jantzen - Introduction To Perceptron Networks
Document32 pages
Jan Jantzen - Introduction To Perceptron Networks
Tuhma
No ratings yet
Kai Labusch, Erhardt Barth and Thomas Martinetz - Sparse Coding Neural Gas: Learning of Overcomplete Data Representations
Document21 pages
Kai Labusch, Erhardt Barth and Thomas Martinetz - Sparse Coding Neural Gas: Learning of Overcomplete Data Representations
Tuhma
No ratings yet
Frank-Michael Schleif - Sparse Kernelized Vector Quantization With Local Dependencies
Document8 pages
Frank-Michael Schleif - Sparse Kernelized Vector Quantization With Local Dependencies
Tuhma
No ratings yet
Kai Labusch, Erhardt Barth and Thomas Martinetz - Approaching The Time Dependent Cocktail Party Problem With Online Sparse Coding Neural Gas
Document9 pages
Kai Labusch, Erhardt Barth and Thomas Martinetz - Approaching The Time Dependent Cocktail Party Problem With Online Sparse Coding Neural Gas
Tuhma
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
Document72 pages
Anthony Kuh - Neural Networks and Learning Theory
Tuhma
No ratings yet
Jorg A. Walter, Thomas M. Martinetz and Klaus J. Schulten - Industrial Robot Learns Visuo-Motor Coordination by Means Of" Neural-Gas" Network
Document8 pages
Jorg A. Walter, Thomas M. Martinetz and Klaus J. Schulten - Industrial Robot Learns Visuo-Motor Coordination by Means Of" Neural-Gas" Network
Tuhma
No ratings yet
Herve Frezza-Buet - Machines A Vecteurs Supports Didacticiel
Document38 pages
Herve Frezza-Buet - Machines A Vecteurs Supports Didacticiel
Tuhma
No ratings yet
Frank-Michael Schleif Et Al - Generalized Derivative Based Kernelized Learning Vector Quantization
Document8 pages
Frank-Michael Schleif Et Al - Generalized Derivative Based Kernelized Learning Vector Quantization
Tuhma
No ratings yet
Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann - Sparse Coding Neural Gas For Analysis of Nuclear Magnetic Resonance Spectros
Document6 pages
Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann - Sparse Coding Neural Gas For Analysis of Nuclear Magnetic Resonance Spectros
Tuhma
No ratings yet
Barbara Hammer and Alexander Hasenfuss - Topographic Mapping of Large Dissimilarity Data Sets
Document58 pages
Barbara Hammer and Alexander Hasenfuss - Topographic Mapping of Large Dissimilarity Data Sets
Tuhma
No ratings yet
Kai Labusch, Erhardt Barth and Thomas Martinetz - Learning Data Representations With Sparse Coding Neural Gas
Document6 pages
Kai Labusch, Erhardt Barth and Thomas Martinetz - Learning Data Representations With Sparse Coding Neural Gas
Tuhma
No ratings yet
Chang Liang Et Al - Scaling Up Kernel Grower Clustering Method For Large Data Sets Via Core-Sets
Document7 pages
Chang Liang Et Al - Scaling Up Kernel Grower Clustering Method For Large Data Sets Via Core-Sets
Tuhma
No ratings yet
Banchar Arnonkijpanich, Barbara Hammer and Alexander Hasenfuss - Local Matrix Adaptation in Topographic Neural Maps
Document34 pages
Banchar Arnonkijpanich, Barbara Hammer and Alexander Hasenfuss - Local Matrix Adaptation in Topographic Neural Maps
Tuhma
No ratings yet
Fabio Ancona Et Al - On The Importance of Sorting in "Neural Gas" Training of Vector Quantizers
Document5 pages
Fabio Ancona Et Al - On The Importance of Sorting in "Neural Gas" Training of Vector Quantizers
Tuhma
No ratings yet
F.-M. Schleif, B. Hammer and T. Villmann - Margin Based Active Learning For LVQ Networks
Document20 pages
F.-M. Schleif, B. Hammer and T. Villmann - Margin Based Active Learning For LVQ Networks
Tuhma
No ratings yet
Lachlan L.H. Andrew - Improving The Robustness of Winner-Take - All Cellular Neural Networks
Document6 pages
Lachlan L.H. Andrew - Improving The Robustness of Winner-Take - All Cellular Neural Networks
Tuhma
No ratings yet
Topology Representing Network Enables Highly Accurate Classification of Protein Images Taken by Cryo Electron-Microscope Without Masking
Document16 pages
Topology Representing Network Enables Highly Accurate Classification of Protein Images Taken by Cryo Electron-Microscope Without Masking
Tuhma
No ratings yet
Frank-Michael Schleif, Andrej Gisbrecht and Barbara Hammer - Accelerating Kernel Neural Gas
Document8 pages
Frank-Michael Schleif, Andrej Gisbrecht and Barbara Hammer - Accelerating Kernel Neural Gas
Tuhma
No ratings yet
Bai-Ling Zhang, Min-Yue Fu and Hong Yan - Handwritten Digit Recognition by Neural 'Gas' Model and Population Decoding
Document5 pages
Bai-Ling Zhang, Min-Yue Fu and Hong Yan - Handwritten Digit Recognition by Neural 'Gas' Model and Population Decoding
Tuhma
No ratings yet
Clifford Sze-Tsan Choy and Wan-Chi Siu - Fast Sequential Implementation of "Neural-Gas" Network For Vector Quantization
Document4 pages
Clifford Sze-Tsan Choy and Wan-Chi Siu - Fast Sequential Implementation of "Neural-Gas" Network For Vector Quantization
Tuhma
No ratings yet
J.-H.Wang and J.-D.Rau - VQ-agglomeration: A Novel Approach To Clustering
Document9 pages
J.-H.Wang and J.-D.Rau - VQ-agglomeration: A Novel Approach To Clustering
Tuhma
No ratings yet
Stephen J. Verzi Et Al - Universal Approximation, With Fuzzy ART And. Fuzzy ARTMAP
Document6 pages
Stephen J. Verzi Et Al - Universal Approximation, With Fuzzy ART And. Fuzzy ARTMAP
Tuhma
No ratings yet
Ajantha S. Atukorale and P.N. Suganthan - Hierarchical Overlapped Neural Gas Network With Application To Pattern Classification
Document12 pages
Ajantha S. Atukorale and P.N. Suganthan - Hierarchical Overlapped Neural Gas Network With Application To Pattern Classification
Tuhma
No ratings yet
Shao-Han Liu and Jzau-Sheng Lin - A Compensated Fuzzy Hopfield Neural Network For Codebook Design in Vector Quantization
Document13 pages
Shao-Han Liu and Jzau-Sheng Lin - A Compensated Fuzzy Hopfield Neural Network For Codebook Design in Vector Quantization
Tuhma
No ratings yet
Jim Holmström - Growing Neural Gas: Experiments With GNG, GNG With Utility and Supervised GNG
Document42 pages
Jim Holmström - Growing Neural Gas: Experiments With GNG, GNG With Utility and Supervised GNG
Tuhma
No ratings yet
A Neural-Gas Network Learns Topologies
Document6 pages
A Neural-Gas Network Learns Topologies
Lucas Arruda
No ratings yet

MART93B

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MART93B

Uploaded by

Copyright:

Available Formats

You might also like