3 views

Uploaded by Lalitha Raja

- Winning Prediction Analysis in One-Day-International (ODI) Cricket Using Machine Learning Techniques
- Adrien Danny Sam Paper
- Handwritten Chinese Text Recognition by Integrating Multiple Contexts
- viva-adrina chin
- 04918628
- Soil Classification Using Image Processing and Modified SVM Classifier
- Machine Learning by Ryan Roberts
- [IJCST-V7I2P6]:Athul Krishna P.B, Raheenamol M.R, Jesna M.S and Ms. Sheena Kurian K
- A Control Scheme for Industrial Robots Using Artificial Neural Networks
- Merged
- Paper IA
- Voltage and Power Losses Control Using Distributed Generation and Computational Intelligence
- IJETR033417
- Artificial Intelligence
- IJETR032708
- Ahmed Paper1
- Machine Learning Introduction
- Knowledge Representation Techniques
- IJETTCS-2014-07-27-61
- Knoblock00 Deb

You are on page 1of 5

1

Mathematical Institute of Slovak Academy of Sciences, Dumbierska 1, Bansk Bystrica

2

University of ilina, Univerzitn 8215, ilina, Slovakia

arXiv:1311.0819v1 [cs.SD] 4 Nov 2013

We consider the ability of a very simple feed-forward ne- network.

ural network to discriminate phonemes based on just relative First, we demand that the output of the neural network is

power spectrum. The network consists of two neurons with balanced i.e. invariant with respect to loudness. We achieve

symmetric nonlinear response over a spectral range. The out- this requirement by restricting our attention to neural ne-

put of the neurons is subsequently fed to a comparator. We tworks carrying out computation

show that often this is enough to achieve complete separation

f (s) = f1 (s) f2 (s), (1)

of data. We compare the performance of found discriminants

with that of more general neurons. Our conclusion is that not where s represents sound and f1 , f2 are nonlinear functions

much is gained in passing to real-valued weights. More li- that grow additively with increase in loudness i.e.

kely higher number of neurons and preprocessing of input

will yield better discrimination results. The networks consi- f1 (s) f1 (s0 ) = f2 (s) f2 (s0 ) = d, (2)

dered are directly amenable to hardware (neuromorphic) de-

signs. Other advantages include interpretability, guarantees of if sounds s and s0 differ purely by d decibels in loudness.

performance on unseen data and low Kolmogoroffs comple- Secondly, we demand a symmetric response of neurons

xity. f1 , f2 over a spectral range. Write s = (s1 , . . . , sn ) for the

Index Terms phoneme discrimination, feed-forward discrete log-periodogram of a sound, i.e. si represents the log

neural network, neuromorphic hardware, TIMIT, memristor of power at frequency (i1) f2n S

. A spectral range r is any sub-

sequence (si , si+1 , . . . , sj ) with i j. Symmetric response

over a spectral range r of a neuron represented by k-ary func-

1. INTRODUCTION tion f means that

Artificial neural networks have emerged as one of the most f (r) = f ((r)) (3)

powerful tools in speech recognition [1], and more generally

in machine learning [2, Chapter 5]. If there is a downside for any permutation of k elements. This requirement is a

to employing neural networks, it is their opaqueness. Much strong form of requiring that the response be invariant to small

like a human brain which inspired them, it is often not clear shifts of formant frequencies. Consider a family of signals,

why they work so well. It is however possible, and even ad- spectra of three are sketched in Figure 1. Suppose one wants

visable [3, page 148], to address this opaqueness by adopting

design principles that will provide guarantees about their per-

formance in situations that did not occur during training.

In our work we introduce a new class of neural networks

that may be used for discrimination between phonemes. They

arose in connection with investigation of computational po-

wer of memristor based networks. As such, they use predo-

minantly min and max processing primitives. There are other

approaches that use the same processing elements [4], [5], [6], Fig. 1. Sketch of power spectra of three signals with a similar

[7], [8], [9] as well as a vast body of research on more gene- formant frequency.

ral fuzzy logic systems. The difference in our work is that we

Research partially supported by grants APVV-0219-12 and VEGA

to construct a function with strong response for signals in this

2/0112/11 . Usage of computational facilities of University of ilina and Uni- family with formants varying between frequencies 1 and 2 .

versity of Matej Bel is gratefully acknowledged. If one restricts oneself to linear forms, there is essentally a

single expression that has uniformly strong response for all where i , i2 are means and variances of evaluations of disc-

signals in the family, namely riminant function f over the two classes. For Z-classifiers we

have maximized the following secondary tie-breaking crite-

si + si+1 + + sj . (4)

rion

This is not true, if one tries expressions from nonlinear algeb- 2 2

ras, even as simple as the algebra generated by binary min FZ (f ) := min 1

, 2 , if sign(1 ) 6= sign(2 ). (9)

and max functions. For instance, consider functions 12 22

max(si , si+1 , . . . , sj ) (5) For our data set we used discrete spectra created by A.

max2 (si , si+1 , . . . , sj ) (6) Buja, W. Stueltze and M. Maechler and used in work [10].

It is freely available in ElemStatLearn package of R statistics

where max2 denotes the second largest element of the set. software as well as online [11]. The data set contains spectra

Both of hese functions have strong and symmetric response of five english phonemes computed from TIMIT database.

uniformly over all signals in the family. We have used custom-built C++ software for results ob-

In our work we shall be concerned with induced B- tained in the next two sections, R for verification, graphing

classifiers that determine the class of a phoneme based on and Nelder-Mead optimization in section 4 and Matlab for

comparing f (s) with a threshold , and a special subclass of computing data in Figure 4.

Z-classifiers, for which = 0.

A vast majority machine learning techniques such as support Figure 2 presents the results of classification by Z-classifiers.

vector machines, neural networks, or various regressions have From the graphs it is clear that in majority of cases, the discri-

a parameter space that forms a Riemannian manifold. Con- minant functions we considered are able to completely sepa-

sequently with these techniques one may use gradient based rate the two classes. Moreover, separation occurs with relati-

optimization mechanisms and sometimes even convex opti- vely short spectral ranges, with size at most 12. There are just

mization. The situation with our class of neural networks is two cases where separation does not occur and that is discri-

different. We need to find an optimal structure in a discrete mination of pairs aa-ao and dcl-iy.

parameter space. There are two hurdles that need to be add- It is interesting to note that passing from Z-classifiers

ressed. to more general B-classifiers does not improve results very

First, in the absence of a clever trick, one needs to search much as can be seen in Table 1.

the discrete space in a reasonable amount of time. One may

opt for a local search whereby an initial network is optimi- Change on

zed by small twists, or perhaps by a genetic algorithm. The phonemes train data test data

disadvantage of the former is that the search may end in a su- aa-ao 0.78 % 1.82 %

boptimal local minimum, whereas the latter may take a long aa-dcl 0.09 % 0%

time to find a good network. In our work we opt for exhaus- aa-iy 0% 0%

tive search of polynomially growing family, namely we will aa-sh 0% 0%

consider only functions of the form ao-dcl 0.08 % 0%

ao-iy 0% -0.17 %

f (s) = q1 (r1 ) q2 (r2 ), (7) ao-sh 0% 0%

where q1 , q2 are quantiles (e.g. max, max2 ), and r1 , r2 are dcl-iy 2.48 % 0.99 %

spectral ranges. If one considers ri of length at most L and the dcl-sh 0% -0.48 %

whole spectrum has N power points, then there are O(N 2 L4 ) iy-sh 0.07 % 0.19 %

such functions and an exhaustive search is feasible.

Secondly, one needs to define a goodness criterion. Ty- Table 1. Improvement (positive) or worsening (negative) of

pically this is classification success. However, in the case performance of B-classifiers compared to Z-classifiers

when multiple discrimination functions achieve perfect se-

paration on training data, a more refined criterion is needed.

In this context one needs to distinguish between training Z- Unlike many other classes of neural networks, the struc-

classifiers and B-classifiers. For B-classifiers, usual Fisher ture (and not only response) of our networks can be clearly

discriminant may be used, which is defined as visualized as seen in Figure 3. In the figure in horizontal scale

we indicate spectral ranges to which the two neurons are sen-

(1 2 )2 sitive. In the vertical scale we indicate maximum length of

FB (f ) := , (8)

12 + 22 spectral ranges r1 , r2 .

5 10 15 20 25 30

dcl sh iy sh

100

95

90

85

80

ao dcl ao iy ao sh dcl iy

100

% correctly classified

95

90

85

80

aa ao aa dcl aa iy aa sh

100

95

90

85

80

5 10 15 20 25 30 5 10 15 20 25 30

maximal width

Fig. 2. Train and test success rates for various pairwise Z-classifiers

4. CONTINUOUS NEIGHBORHOOD OF OPTIMAL with exactly one nonzero entry in both b1 and b2 . One may

QUANTILE CLASSIFIERS relax this condition on bi to obtain other classifiers.

One may try to improve the discrimination results by allowing Example 1. Consider the B-classifier defined by function

more general functions of a spectral range. So let us suppose

f (s) = max(s62 , s63 , . . . , s74 ) s1 , (12)

that we have spectral ranges s1 , s2 . Let us write sort() for the

function that orders its vector argument elementwise. Com-

with threshold value = 4.03279 that decides

monly used LDA tries to optimize Fishers discriminant of

functions (

phoneme is dcl if f (s) < 4.03279,

w1 r1 w2 r2 , (10) (13)

phoneme is iy if f (s) > 4.03279.

with real valued weight vectors w1 , w2 , whereas in the previ- We can consider vectors bi in (11) of the following categories

ous section we optimized separations of expressions

(monotone OWA) nonnegative, increasing entries in

q1 (r1 ) q2 (r2 ) = b1 sort(r1 ) b2 sort(r2 ), (11) each bi , with total sum equal to one,

5. FUTURE WORK

ao iy

results due to limited space. It is clear however that more rese-

30

arch is needed for this class of networks to find applications.

Let us outline the directions further research may take.

25

First, one should take into account known psychoacoustic

phenomena of human hearing. It may prove advantageous to

spectral range width

20

frequency [15]. Our experiments showed that often low fre-

quency power was crucial for discrimination and thus it may

15

prove useful to use Q-transform [16] which provides more

data points in lower frequencies compared to ordinary FFT.

10

Secondly, it is well known that it is two and sometimes up

to 4 formants that characterize a vowel. It is therefore neces-

sary to consider a more complex set of discrimination func-

5

tions. In [17] we proposed an algebra, whose elements are

candidates for describing the structure of more complex ne-

0 2000 4000 6000 8000

tworks.

Frequency (Hz)

Let us conclude with summarizing advantages of propo-

sed networks. By design, they provide a guaranteed perfor-

Fig. 3. Spectral ranges of optimal neural networks for disc-

mance on variations of trained data unseen during training,

rimination ao-iy plotted against increasing spectral range

they are interpretable and have very low Kolmogoroffs com-

width. White circle indicates the position of quantile (right is

plexity, as the example (12) shows.

the maximum, the left end and the middle would represent

the minimum and the median respectively). More graphs are

available in [12, page 83].

sum equal to one,

hods in the table use only spectral compoments s1 and the

spectral range s62 , s63 , . . . , s72 . Balanced LDA is a variant

of LDA, in which the sum of all coefficients is 0. Our conclu-

Fig. 4. Evaluation of dcl-iy classifier over a Slovak word

(IPA: /odiSla/) superimposed on PCM signal. Note that the

method train test er- Fisher Fisher

classifier was train on English data (TIMIT).

error ror train test

score score

Last, but not the least, the networks can be easily imple-

quantiles (f ) 2.7 % 5.1 % 6.22 6.54

mented in hardware, since BJT [18], CMOS [19] and even

monotone OWA 3% 4% 6.24 6.5

passive memristor implementations [20] of min, max and

OWA 3% 4.5 % 6.28 6.54

comparison operators exist. In this context it is worthwhile

ordered LDA 0.8 % 1.8 % 15 12.83

to point out that it is the change of, rather than the absolute

balanced LDA 4.2 % 5.7 % 5.69 5.85 spectral content, that can be read off from these discrimi-

LDA 1.2 % 2.2 % 13.81 12.22 nants. This is illustrated in Figure 4 where transition between

phonemes is quite strong. One may thus hypothesize that

Table 2. Comparison of LDA with ordered methods. See text

analog hardware speech recognizer could be based on silicon

for explanation of various methods.

cochlea ([21], followed by processing by a neural network

of the kind described here, whose output would be fed to

sion is that not much is gained by passing from discrete struc- adaptive differentiator like that of Delbrck and Mead [22],

tures represented by quantiles to weighted ones, in line with and finally to a memristive switch [23].

similar research [14].

References [13] R. Yager, On ordered weighted averaging aggrega-

tion operators in multicriteria decisionmaking, Sys-

[1] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. tems, Man and Cybernetics, IEEE Transactions on,

Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, vol. 18, no. 1, pp. 183190, 1988, ISSN: 0018-9472.

and B. Kingsbury, Deep neural networks for acous- DOI : 10.1109/21.87068.

tic modeling in speech recognition: the shared views

[14] D. Soudry and R. Meir. (2013). Mean Field Bayes

of four research groups, Signal Processing Magazine,

Backpropagation: scalable training of multilayer ne-

IEEE, vol. 29, no. 6, pp. 8297, 2012, ISSN: 1053-

ural networks with binary weights, [Online]. Available:

5888. DOI: 10.1109/MSP.2012.2205597.

http://arxiv.org/abs/1310.1867.

[2] C. M. Bishop, Pattern recognition and machine lear-

[15] R. Stern and N. Morgan, Hearing is believing: bio-

ning. Springer, 2006.

logically inspired methods for robust automatic spe-

[3] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A ech recognition, Signal Processing Magazine, IEEE,

learning algorithm for boltzmann machines, Cogni- vol. 29, no. 6, pp. 3443, 2012, ISSN: 1053-5888. DOI:

tive Science, vol. 9, no. 1, pp. 147 169, 1985, ISSN: 10.1109/MSP.2012.2207989.

0364-0213. DOI: http : / / dx . doi . org / 10 .

[16] J. C. Brown, Calculation of a constant q spectral

1016 / S0364 - 0213(85 ) 80012 - 4. [Online].

transform, The Journal of the Acoustical Society

Available: http://www.sciencedirect.com/

of America, vol. 89, no. 1, pp. 425434, 1991. DOI:

science/article/pii/S0364021385800124.

10.1121/1.400476. [Online]. Available: http:

[4] S. Badura, Learning of fuzzy logic circuits with me- / / scitation . aip . org / content / asa /

mory for speech recognition using video, Ph.D. thesis, journal/jasa/89/1/10.1121/1.400476.

ilinsk univerzita, 2012.

[17] O. uch. (2013). Phoneme discrimination using KS

[5] S. Foltn, Speech recognition by means of fuzzy logi- algebra I, [Online]. Available: arxiv . org / abs /

cal circuits, Ph.D. thesis, ilinsk univerzita, 2012. 1302.6031.

[6] S. Foltn, Speech recognition by means of fuzzy lo- [18] T. Yamakawa, Fuzzy logic computers and circuits,

gical circuits, in 18th international conference on soft U.S.Patent 4,875,184, 1987.

computing, MENDEL 2012, Brno, Jun. 2729, 2012.

[19] I. Baturone, S. Sanchez-Solano, A. Bariga, and J. L.

[7] S. Foltn and J. Smieko, Phoneme recognition by Huertas, Implementation of CMOS fuzzy controllers

application of genetic programming, in Sbornk prs- as mixed-signal integrated circuits, IEEE J. of Solid

pevku z mezinrodn vedeck konference, Mezinrodn State Circuits, vol. 5, no. 1, pp. 119, 1997.

konference pro doktorandy a mlade vedeck pracov-

[20] M. Klimo and O. uch. (2011). Memristors can im-

nky, Jun. 2729, 2012, pp. 22342241.

plement fuzzy logic, [Online]. Available: http : / /

[8] S. Badura, M. Klimo, and O. kvarek, Lip reading arxiv.org/abs/1110.2074.

using fuzzy logic network with memory, in Appli-

[21] B. Wen and K. Boahen, A silicon cochlea with ac-

cation of Information and Communication Technolo-

tive coupling, Biomedical Circuits and Systems, IEEE

gies (AICT), 6th International Conference, Oct. 2012,

Transactions on, vol. 3, no. 6, pp. 444 455, Dec. 2009,

pp. 14. [Online]. Available: http://ieeexplore.

ISSN : 1932-4545. DOI : 10 . 1109 / TBCAS . 2009 .

ieee.org/stamp/stamp.jsp?tp=&arnumber=

2027127.

6398471&isnumber=6398461.

[22] T. Delbrck and C.A.Mead, Adaptive photoreceptor

[9] S. Badura, S. Foltn, and M. Klimo, Fuzzy logic

with wide dynamic range, in 1994 International Sym-

networks for speech recognition, Communications,

posium on Circuits and Systems, ISCAS94, 1994, 339

Scientific Letters of University of ilina, pp. 1318, 2

342, vol.4.

2013.

[23] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S.

[10] T. Hastie, A. Buja, and R. Tibshirani, Penalized disc-

Wiliams, The missing memristor found, Nature, vol.

riminant analysis, The Annals of Statistics, vol. 23, no.

453, pp. 8083, 2008, doi:10.1038/nature06932.

1, pp. 73102, 1995.

[11] (Dec. 7, 2012). English phonemes, [Online]. Available:

http://www- stat.stanford.edu/~tibs/

ElemStatLearn/datasets/phoneme.data.

[12] O. uch. (2013). Using min-max circuits for speech

recognition, [Online]. Available: http : / / www .

savbb.sk/~ondrejs/Phoneme/book.pdf.

- Winning Prediction Analysis in One-Day-International (ODI) Cricket Using Machine Learning TechniquesUploaded byInternational Journal Of Emerging Technology and Computer Science
- Adrien Danny Sam PaperUploaded byAzri Mohd Khanil
- Handwritten Chinese Text Recognition by Integrating Multiple ContextsUploaded byieeexploreprojects
- viva-adrina chinUploaded byapi-356886215
- 04918628Uploaded byBiswanath Dekaraja
- Soil Classification Using Image Processing and Modified SVM ClassifierUploaded byEditor IJTSRD
- Machine Learning by Ryan RobertsUploaded byBigJ
- [IJCST-V7I2P6]:Athul Krishna P.B, Raheenamol M.R, Jesna M.S and Ms. Sheena Kurian KUploaded byEighthSenseGroup
- A Control Scheme for Industrial Robots Using Artificial Neural NetworksUploaded byKansocrah Nak
- MergedUploaded byAnonymous 0MuHnGsfH
- Paper IAUploaded byJose Carvajal
- Voltage and Power Losses Control Using Distributed Generation and Computational IntelligenceUploaded byernesto
- IJETR033417Uploaded byerpublication
- Artificial IntelligenceUploaded byYashwanth Varma
- IJETR032708Uploaded byerpublication
- Ahmed Paper1Uploaded bymona4
- Machine Learning IntroductionUploaded byAlex David
- Knowledge Representation TechniquesUploaded byBuddhika Prabath
- IJETTCS-2014-07-27-61Uploaded byAnonymous vQrJlEN
- Knoblock00 DebUploaded byzzztimbo
- ICU Patient Deterioration Prediction : A Data-Mining ApproachUploaded byCS & IT
- Neural Computation for Adaptive Gait Control of the Quadruped Over Rough TerrainUploaded byMer Fro
- Velloso TNSM10 SlideUploaded bysridharegsp
- dorffnerUploaded bym. a.
- 2CS231n Convolutional Neural Networks for Visual RecognitionUploaded byDonna
- SC - wikiUploaded byrudy_tanaga
- isearchUploaded byapi-360927804
- Obstacle Avoidance Methods for a Laser Based Intelligent WheelchairUploaded bypvdai
- 15 1458195438_17-03-2016Uploaded byEditor IJRITCC
- Dengue Vector Larval Habitats in an Urban EnvironmentUploaded byJesús Poot Islas

- extIPAChart2008Uploaded bySantiago Guerrero
- Student SPD Checklist for TeachersUploaded byMaria Guy Del Duca
- Student SPD Checklist for TeachersUploaded byMaria Guy Del Duca
- Tamilweb_ Grammar NotesUploaded byLalitha Raja
- 5th-Call for Proposals in DST-InRIA-CNRS Targeted ProgrammeUploaded byLalitha Raja
- unit 4Uploaded byLalitha Raja
- 10.1.1.113.2960Uploaded byLalitha Raja
- Sensory Processing Disorder ChecklistUploaded byLalitha Raja
- Early Language Development-Trends in Language Acquisition Research.pdfUploaded byret wahyu
- General GuidelinesUploaded byLalitha Raja
- 10936_2016_Article_9469Uploaded byLalitha Raja
- application_format.pdfUploaded byLalitha Raja
- SRCD2007_2-105_132_CochUploaded byLalitha Raja
- Rahu Ketu EngUploaded byLalitha Raja
- 1 (8)Uploaded byLalitha Raja
- 0012 Genesee BrainUploaded byLalitha Raja
- Educator Panel - KROLLUploaded byLalitha Raja
- Flash CardsUploaded byLalitha Raja
- Behavioral BrainUploaded byLalitha Raja
- Voice Disorders 3Uploaded byLalitha Raja
- Tracing Traditional Roots of Telugus Settled in Tamil Nadu - The HinduUploaded byLalitha Raja
- StutteringUploaded byscoviatugonza
- MTEUploaded byLalitha Raja
- 02 Central Nervous System Ppt 2027Uploaded bysajida_hajju
- Prof. N. NatarajapillaiUploaded byilasundaram
- Chapter 1Uploaded byLalitha Raja
- Cohesion and CoherenceUploaded byLalitha Raja
- Pragmatics Profile ChildrenUploaded byLalitha Raja
- 17 VowelsUploaded byZulma Cardona

- The Wishing Cupboard S3Uploaded byS TANCRED
- 0-0000 (Table of Contents & Preface)Uploaded byZafar Wah
- Capr i En6271Uploaded bykale sanjay
- 10.3 Dispraksia & DisfasiaUploaded bywsphuah
- Borang Pemeriksaan Kesihatan (Pin.1-2014)Uploaded byAAyel
- Roadmap to Residency 2ndEdUploaded byJohn David Rodriguez
- Cj52 Public PartUploaded bybarryhopewell
- Implementation AnalysisUploaded byDervish
- australiancurriculum media arts 7-10Uploaded byapi-281947898
- Association of Mexican American Educators. Vol VI, Issue 3Uploaded byColigny1977
- How to Make Booatable USBUploaded byT.h. Min
- Aug 6-10 CNFUploaded byMylz Villarin
- 2009 ASTD State of the Industry ReportUploaded byproceoac
- Uppcl AdmitUploaded byshanukshapreeti
- Passive VoiceUploaded byDavid Steinweg
- NRVEFinalReportUploaded byraj
- Richard Fine USDC CivilRightsComplaint1-6-10Uploaded byLeslie Dutton
- Default Password ListUploaded byycescudero
- Replication Studies_ Bad Copy _ Nature News & CommentUploaded byDarkoPolšek
- PLS Newsletter - May 2014Uploaded bySean Leber-Fennessy
- GMC Relationship ManagerUploaded bysanthosh.ramdoss
- Elevator Control System ProjectUploaded byChandra Prakash Jain
- 06798_AdventWorldBookandStudentTextUploaded byDouglas Reis
- The Final ThesisUploaded byAlexis Martinez Constantino
- Towards a School Wide Model of Social JusticeUploaded byMiguel Alejo
- 204-Neuro Basics.pdfUploaded byHadi El-Maskury
- L1-EMM 3233 REVD KPM.docxUploaded byMat Harzick
- Lesson 7 - IT for Higher Thinking Skills and CreativityUploaded byElle Kwon
- cte electronics literacy toolkit pdfUploaded byapi-290593844
- walker orchestra handbook 2018-2019Uploaded byapi-417268559