You are on page 1of 7

Mahalanobis-Taguchi System: Theory and Applications

Yu-Hsiang Hsiao
Department of Industrial Engineering and Engineering Management
National Tsing Hua University
Hsinchu 300, Taiwan
e-mail: Adrian.iem88@nctu.edu.tw
to differentiate different quality levels of saxophones. It
is worthy to notice that the pitch can be easily checked
using an apparatus while the timbre quality mainly
depends on the musicians hearing judgment. However,
the sensitivity and the stability of human perception can
be influenced by many factors, such as emotions
(psychological) and tiredness (physiological) [1], which
may lead to decrease the reliability of timbre quality
inspection. Actually, the saxophone timbre quality
judgment task implemented by the professional musicians
in the final inspection stage can be regarded as a musical
instrument timbre classification problem.
Generally, the framework of a musical instrument
timbre classification task involves two phases:
parameterization phase and classification phase.
Because the sound signals presented as raw waveform
cannot be directly processed by a classification algorithm,
the parameterization phase attempts to extract a wide set
of quantified features from the sound signals to
appropriately model their temporal and spectral
characteristics. The parameterization can be achieved
via various signal analysis techniques considering the time,
frequency or time-frequency domain [2]-[6]. Much
work has been done in order to identify the important
perceptual features for timbre recognition [8], [10]-[12],
[19], [20]. After the features characterizing original
sound signals are extracted in parameterization phase,
classification algorithms coming from statistics, soft
computing, or machine learning domain [13]-[16], [18]
are then employed to accomplish the musical instrument
timbre classification or recognition tasks.
In this study, the multi-class Mahalanobis-Taguchi
system (MMTS) was proposed for multi-class
classification and feature selection. MMTS breaks the
limitation of MTS in which only one Mahalanobis space
is constructed for one problem, and establishes an
individual Mahalanobis space for each class to
simultaneously accomplish the multi-class classification
and feature selection tasks.
Also, an automatic
multi-class timbre classification system (AMTCS) was
established in order to increase the accuracy and reliability
of timbre quality inspection of saxophone manufacture
and prevent from the judgment bias caused by human
perception. The AMTCS composes of our proposed
waveform shape-based feature extraction method
(WFEM) in parameterization phase and MMTS in
classification phase. From different point of view, the
WFEM attempts to parameterize the sound signals by
directly catch the shape properties of signal waveform
instead of summarizing the signal behavior in time or
frequency domain, which has been done in most research
work. Finally, a real case about the inspection of
saxophone timbre quality using the verified AMTCS was
presented.

Abstract
An novel multi-class classification and feature selection
algorithm called multi-class Mahalanobis-Taguchi System
(MMTS) is propposed in this study. Also, in order to
improve the reliability of saxophone timbre quality
inspection, an automatic multi-class timbre classification
system (AMTCS) is developed. The AMTCS composes
of our proposed waveform shape-based feature extraction
method (WFEM) in parameterization phase and MMTS
in classification phase. Through employing the AMTCS,
strong assistance was provided to the inspection of
saxophone timbre quality, and a perfect identification rate
on the saxophones with different timbre quality levels is
achieved.
Keywords: Classigication, Feature selection, Feature
extraction, Mahalanobis-Taguchi System, Timbre.
1. Introduction
Mahalanobis-Taguchi system (MTS) developed by
Dr. Taguchi is a collection of methods that was proposed
as a forecasting, classification, and feature selection
technique using multivariate data [21]. So far, MTS has
been successfully used in various applications.
Nevertheless, all of these applications are restricted to
two-class problems due to the limitation of MTS
algorithm that only one Mahalanobis space can be
constructed for one problem. In order to enhance the
practicality in real world, effectively extending MTS
algorithm to support multi-class problems is non-trivial.
Currently, the most popular method of extending a
two-class classification algorithm to a multi-class one is to
decompose a multi-class problem into a collection of
two-class problems [25]-[28].
Although the
decomposition strategies have been extensively utilized
because of their uncomplicated concept and usefulness,
they are not without limitations.
Mostly, the
decomposition strategies dont have the capability of
supporting a binary algorithm for implementing the
feature selection on whole multi-class problem even if the
algorithm can do that well in a two-class problem.
Therefore, in order to effectively extend MTS to support
the multi-class classification problems and at the same
time ensure a well carried out feature selection, it is worth
developing a new MTS algorithm that can simultaneously
handle more than two classes.
Besides, under the highly developed automation
today, the manufacture of saxophone is still a
non-automatic process and much relies on highly skilled
technicians.
It is possible that some unobvious
imprecision that has impact on the sound quality may be
caused during handmade manufacture. Thus, sound
quality is tested in the final inspection by professional
musicians to insure that the timbre quality and other sound
elements are within acceptable specifications, and further
1

1, if [( y[e] AL ) ( y[ e + 1] AL )] 0
0, if [( y[e] AL ) ( y[e + 1] AL )] > 0

2. Proposed Automatic Multi-class Timbre


Classification System

L ( e) =

Obviously, a signal with higher fundamental frequency


will have larger level-crossing amount in a frame. In
order to eliminate the effect of this phenomenon, the
proportion of the level-crossing amount on each
amplitude level to that on zero amplitude level is
calculated as the variation property. The equation of the
variation property on AL , i.e. VAL , is shown as (7).
The VA0 is ignored for being permanently equal to 1.

2.1 Parameterization Phase: Waveform Shape-based


Feature Extraction Method
In this study, a simple sound signal feature
extraction method, called WFEM, based on characterizing
the shape of signal waveform was proposed. Generally,
there are four parts, attack, decay, sustain, and release, can
be identified in a sound waveform according to the energy
[16]. Since the sustain part is considered as one of the
most importance parts for timbre recognition and has
relatively stable fundamental waveform shapes with time,
the features characterizing the waveform shape of the
sustain part were considered in the WFEM. The
procedures of our proposed WFEM for isolated musical
instrument tones are detailed as follows:
Step 1: Frame and Separate Out the Sustain Part of A
Musical Instrument Tone
A signal of musical instrument tone is framed and
the sustain part which is with stable energy is separated
out. In general, the frame should contain more than one
fundamental period of the signal and is typically set to be
20 to 30 ms with some overlap. Let E be the frame
size and y[e] be the amplitude of e th sample in frame.
The energy of a frame can be calculated by (1).
E

Energy = ( y[e]) 2

CAL
(7)
CA0
Step 5: Derive the Existence Property on Each
Amplitude Level
Existence property on an amplitude level is
recorded by calculating the level-integral value of
waveform over the range from the first sample to the final
sample in a frame on the basis of each amplitude level.
Because the sound is digitally recorded, the waveform is
formed by linearly connecting the adjacent samples.
Thus, the integral of waveform can be perfectly
approximated by Trapezoid Rule [22] which is a method
to approximately calculate the integral of a function over a
definite range. The Trapezoid Rule approximates the
integral of a function by fitting several trapezoids to the
region under the function, and the overall area of the
trapezoids is calculated to be the approximate integral
value. Let WL ( y ) be the AL -centered waveform
function. By the Trapezoid Rule with the width of each
trapezoid equal to 1, the level-integral value of the WL ( y )
over whole frame range, i.e. IN L , can be computed
using (8).
VAL =

(1)

e =1

Step 2: Determine the Upper and Lower Bound for


Each Frame
Let + y max and y max be the upper and lower
bound of a frame, respectively. The y max is determined
by following equation:
y max = max { y[e] }

(2)

e =1, 2 , K, E

e=1

( y[e] AL ) + ( y[e + 1] AL )
2

(8)

Similarly, homogeneous signals with different


fundamental frequency will result in different integral
values even if on the same amplitude level. Thus, in
order to eliminate the effect of different fundamental
frequency, the proportion of the level-integral value on
each amplitude level to that on the lower bound is
calculated as the existence property. The equation of the
existence property on AL , i.e. EX L , is shown as (9).
IN L
(9)
EX L =
IN ymax

The appropriate number of amplitude levels is determined


depend on different cases, and is the outcome of trade-off
between rich information and low complexity.
Step 4: Derive the Variation Property on Each
Amplitude Level
Variation property on an amplitude level is
recorded by calculating the level-crossing amount.
The level-crossing amount is the number of intersections
of signal waveform and the amplitude level. The
level-crossing amount on AL , i.e. CAL , can be
calculated as follows:
E 1

E 1

IN L = WL ( y ) dy =

Step 3: Set Amplitude Levels for Each Frame


The interval between any two consecutive
amplitude levels is set to be uniform, and the width of an
interval, i.e. S , is determined as follows:
2 y max
(3)
S=
2
where 2 is the number of intervals between the upper
and lower bound of a frame. Therefore, the Lth
amplitude level, i.e. AL , can be set as:
AL = {L S L = ( 1), ( 2), K , 0} (4)

CAL = L (e)

(6)

Where
E 1

IN ymax =
e =1

[ y[e] ( ymax )] + [ y[e + 1] ( ymax )] (10)


2

Step 6: Obtain the Waveform Shape-based Feature Set


The average of the variation and existence property
on each amplitude level over all frames is taken to be the
feature for a musical instrument tone. That is, for a
musical instrument tone, a set of totally 4 3 features
composing of 2 2 features for describing the
variation
property
(
,
where
VAL
L = ( 1), ( 2),K, 1 ) and 2 1 for the
existence
property
(
,
where
EX L
)
are
extracted.
L = ( 1), ( 2), K , 0

(5)

e=1

where
2

Gram-Schmidt vector of the q th feature in example r


processed by MS i ; ( i ) is the standard deviations of

2.2 Classification
Phase:
Multi-class
Mahalanobis-Taguchi System
To be able to surmount the restriction on the
suitability of using MTS against multi-class problems, the
MMTS was proposed according to the framework of
MTS and is composed of four main implementing stages.
In this study, the MMTS was employed as the main
algorithm in the classification phase of musical instrument
timbre classification task.
The four phases of
implementing MMTS are detailed as follows.
Stage 1: Construction of A Full Model Measurement
Scale with the Mahalanobis Space of each Class as the
Reference.
In this stage, the problem and all related features are
defined, and the representative examples are collected to
construct the individual Mahalanobis space for each class
and establish the full model measurement scale. In order
to enhance the accuracy of constructing the measurement
scale, Gram-Schmidt orthogonalization process [21] is
applied to eliminate the multicollinearity among features
that makes the covariance matrix almost singular and the
inverse matrix invalid.
For each of the k class, we define the examples sampled
from its population as normal while the examples
coming form the other k 1 classes is defined as
abnormal. The Mahalanobis space MS i can be
regarded as a database containing three components of
class Ci : the feature mean vector, the feature standard

l 1

Gram-Schmidt orthogonalization process ( i = 1, 2, K, k ).


With these Mahalanobis distances, we can define the
center point and the unit distance for each class, and by
which the reference base for the measurement scale is
determined.
Stage 2: Validation of the Full Model Measurement
Scale
In this stage, the effectiveness of discriminating
different classes using the full model measurement scale is
validated. For this purpose, the Mahalanobis distance to
each Mahalanobis space is calculated for each example.
The measurement scale is then validated by examining the
separability of the Mahalanobis distances corresponding
to the examples with different classes. However, if there
is no significant difference between the normal and
abnormal Mahalanobis distances, it implies that the
constructed Mahalanobis space can not suitably represent
the corresponding real normal condition, and what we
should do is to return to the beginning of whole problem
and do some checks on the completeness of considered
features or on the representative of collected examples
used to construct Mahalanobis space.
Stage 3: Identification of the Important Features
In this stage, orthogonal arrays and signal-to-noise
ratio are used to identify the important features for
multi-class classification. Inside an orthogonal array,
every row (run) presents a different combination of
features used for Mahalanobis space construction. The
signal-to-noise ratio, , corresponding to each run is
computed using the concept of the larger-the-better type
and is defined using the following equation:

(11)

q =1

where

Al(( ip) )

is the l th

feature vector of MS i

1
( p i )

K
1 ni MD j ( i )

= 10 log 10
(
p
=
i
)
= ip

n
MD
i j =1
i , p =1
j(i )
i , p =1

standardized by MS p ; U ( p ) is the Gram-Schmidt


q
(i )

vector of the q th feature of MSi orthogonalized on the


basis of MS p ; t
is the Gram-Schmidt coefficient of
lq

MS p

and is set as follows for l = 1, 2, K , d ,

the j th example in MS i to class C p , and p = i ;

q = 1, 2, K , l 1 :

tlq ( p ) =

th
MD (j (pi )i ) is the Mahalanobis distance from the j

( p)
q( p )

(12)

example in MS i to class C p , and p i . After the


signal-to-noise ratio of every run is obtained, the effect
gain of each feature can be calculated using the following
equation:
+

(15)
Gainl = SN l SN l

U q ( p ) U q( (pp))

where A (( pp )) is the l th standardized feature vector of


l
MS p ; U q( (pp)) is the Gram-Schmidt vector of the q th

feature of MS p . Now, the Mahalanobis distance from


any example r to Ci can be calculated using the
Gram-Schmidt orthogonalization process as following
equation.

where SN l is the average signal-to-noise ratio of all

runs including the l th feature; SN l is the average


signal-to-noise ratio of all runs excluding the l th feature.
If the effect gain corresponding to a feature is
positive, the variable my be important and may be
considered as worth keeping. However, a variable with
negative effect gain should be removed.
Stage 4: Future Prediction with the Important Features

MDr( i ) =

where d

(14)

where ni is the umber of examples in the Mahalanobis


space MS i ; MD ( (pi )=i ) is the Mahalanobis distance from
j

( p)

( p )T
l( p )
( p) T

for p = i .

For the normal examples in MS i , we compute


their Mahalanobis distances to Ci using the

deviation vector, and the covariance matrix. In order to


enhance the accuracy of constructing the measurement
scale, the input features are orthogonalized by to eliminate
the multicollinearity between features that makes the
correlation matrix almost singular and the inverse matrix
invalid. The Gram-Schmidt orthogonalization process is
as following equation.
U l(( ip) ) = Al(( ip) ) tlq ( p )U q( (pi ))

( p)
q( i )

(i )
1 d u rq

d q =1 q ( i ) 2

(13)

is the number of features; u rq(i ) is the


3

manufacturers credit in market. In this case study, we


attempt to establish an automatic multi-class saxophone
timbre classification system using our proposed AMTCS
to assist the musician in implementing the timbre quality
inspection more reliably and prevent from the judgment
bias caused by human perception. For this purpose,
totally 150 alto saxophones containing 50 for each timbre
quality level were collected as the analyzed examples
from the final inspection station. For each example, 15
isolated tones from c1 to c 3 with no vibrato were
performed by the professional musicians with almost the
same intensity, and were recorded in 4 seconds long,
monophonic and sampled in 44.1 KHz with 16 bit
resolution. Each tone was divided into several frames of
25 ms with 30% overlap at the two adjacent frames, and
the sustain part was separated out according to the signal
energy for deriving the features. In this case, the number
of intervals, 2 , between the upper and lower bound of
frame was set to be 10. Fig. 1 shows the upper and
lower bounds and the amplitude levels of a frame from the
tone d 1 conducted by a high quality saxophone. Thus,
17 features composing of 8 features for describing the
variation property and 9 for the existence property were
extracted from each tone, and this yielded totally 255
features (17 features for each of the 15 tones) for a
saxophone example.

In the this stage, a reduced model measurement


scale is constructed using the important features, and a
proposed weighted Mahalanobis distance is employed
to verify the measurement scale and to be the distance
metric for classification. The weighted Mahalanobis
distance weighing the different features in the
Mahalanobis distance according to the corresponding
effect gains obtained in the third phase is used for
classification after the reduced model measurement scale
is validated. The weighted Mahalanobis distance from
any example r to Ci is computed through the
following equation:
WMDr(i ) =

wl
lR

url( i )

(16)

(i )

where is the number of features in reduced model;


th
u rl( i ) is the Gram-Schmidt vector of the l feature of
example r processed by MSi in reduced model; ( i )
l
is the standard deviations of U (( pi ) ) in reduced model for
l
th
p = i ; wl is the weight of the l feature in reduced
model and defined as follows:
Gainl
(17)
wl =
Gainl

lR

where Gainl is the effect gain of l th feature in reduced


model.
By classifying examples into the class with
minimum weighted Mahalanobis distance, the
classification accuracy can be acquired. Finally, a test
should be implemented using the unknown examples to
confirm the classification ability of the reduced model.

0.8

+ y max = 0.7344
A4

0.6

A3

0.4

A2
0.2

A1
A0

A1

-0.2

A2
-0.4

A3

A4

-0.6

3. Case Study
Houli, Taiwan is one of the main production places of
saxophones in the world. At present, 16 out of 25
Taiwanese musical instrument manufacturers are located
in Houli, and the total employees of the whole industry are
around 1500 people. In addition to develop the OBM
products, most manufacturers in Houli have been
authorized to produce musical instruments by many
global famous musical instrument companies (OEM) for a
long time. According to the statistics from Ministry of
Economic Affairs, Taiwan, around one third of the
saxophones in the world are made in Houli, and the annul
output value reaches 10 hundred million New Taiwan
dollars in 2006, which is around 50% of the total value in
the world.
The real case studied here comes from a saxophone
manufacturer located in Houli, Taiwan. Alto saxophone
is one of main products of this manufacturer. At present,
the finished alto saxophones are grouped into three quality
levels according to their timbre which is subjectively
judged by the professional musicians in the final
inspection stage of the manufacturing procedure. The
three quality levels are high quality, qualified quality, and
adjustment needed.
However, the efficiency of
inspection process is always low because the professional
musicians need to periodically take a rest to prevent from
the hearing tiredness. Moreover, a wrong judgment of
the saxophone quality may lead to huge loss of

-0.8

y max = 0.7344
0

200

400

600

800

1000

Fig. 1 The upper and lower bounds and the amplitude


levels of a frame from the tone d 1 conducted by
a high quality saxophone.
A metric called multiple Mahalanobis distance [21]
was used when implementing MMTS of AMTCS in this
case study. Multiple Mahalanobis distance method is
very useful in the situations where the large number of
variables can be divided into several meaningful subsets
containing local variables. In the multiple Mahalanobis
distance method, the Mahalanobis distance is calculated
using local variables for each subset, and the Mahalanobis
distances of all subsets belonging to an example form a
new variable vector. MMTS is then implementing on
the dataset with the new variable vectors. In this way,
not only the complexity of problems is reduced but the
important variables selected by MMTS can have more
practical meaning. In the saxophone case, multiple
Mahalanobis distance method was used because the 255
features of a saxophone example can be divided in to 15
feature groups according to the tones where the features
come from. Moreover, it is more valuable to get the
knowledge about which tone causes the different quality
levels of saxophones than to review the individual
4

and LVQ, the feature selection was not considered.


Table 2 shows the classification results and the percentage
of tone reduction done by each classification system.
The 0.95 confidence interval for the classification
accuracy are shown in Fig. 2. The results indicated that
although the classification performances of systems are
not significantly different, MMTS, SDA, and kNN
achieved 100% identification accuracy on average using
our proposed waveform shape-based features. Also, the
classification system involving MFCC yielded 100%
identification accuracy on average and has the same
percentage of tone reduction as AMTCS. The systems
containing C4.5 and RST used fewer tones, but they got
lower accuracy rates of 92.31% and 96.31%, respectively.
Although SDA and kNN performed as perfect as MMTS
using waveform shape-based features, MMTS used much
fewer tones to discriminate the saxophones with different
timbre quality and this made the classification more
efficient.

importance of each of the 255 features. Therefore, for a


saxophone example, the Mahalanobis distance of each
tone was first calculated using the corresponding 17
features, and this resulted in a new feature vector
composing 15 Mahalanobis distances. The MMTS is
then implemented on the dataset containing 150
saxophone examples and each example has 15 feature
values representing the 15 tones.
The 5-fold stratified cross-validation strategy was
applied for the classification of three saxophone quality
levels. The classification and feature selection results
yield by our proposed AMTCS are shown in Table 1.
The results indicated that AMTCS achieved perfect
identification accuracy. Also, information about the
importance of each tone for discriminating the different
quality levels of saxophones was revealed. Tones c1 ,
d 1 , e1 , f 1 , g 1 , c 2 , e 2 , f 2 were identified in more
than three of the five folds, and were thus considered
important for discriminating the saxophone quality levels.
Through employing our proposed AMTCS, strong
assistance was provided to implement the final timbre
inspection of alto saxophone. This not only can increase
the reliability of timbre quality judgment but also can
reduce the required effort for discriminating different
quality levels by professional musicians hearing.
Besides, through the feature selection function, the
significant tones having impact on saxophone timbre
quality can be easily identified. This not only provides
the direction for the follow-up saxophone adjustment but
informs the manufacturer about the critical points for
improving the timbre quality of alto saxophone.

Table 2 Classification Results and Tone Reduction of


Saxophone Case
Classification Accuracy (%)

Classification
Accuracy

Need
Adjustment

Qualified
Quality

High Quality

Overall

(%)

100.00

100.00

100.00

100.00

Tone
Fold

Feature
Selection

1
2
3
4
5
Overall

c1 d1 e1 f1 g1 a1 b1 c2 d2 e2 f2 g2 a2 b2 c3

Tone
Reduction

AMTCS
WFEM+SVM
WFEM+C4.5
WFEM+RST
WFEM+SDA
WFEM+LVQ
WFEM+kNN
WFEM+BPN
MFCC+MMTS

52%
79%
75%
4%
52%

Accuracy Rate

Table 1 Classification and Feature Selection Results of


AMTCS on Saxophone Case

Classification
System

Need
Adjustment

Qualified
Quality

High
Quality

Overall

100.00
98.57
88.73
94.57
100.00
98.00
100.00
98.18
100.00

100.00
100.00
98.00
97.78
100.00
98.33
100.00
100.00
100.00

100.00
100.00
90.26
96.57
100.00
100
100.00
97.50
100.00

100.00
99.52
92.31
96.31
100.00
98.78
100.00
98.56
100.00

1.06
1.04
1.02
1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
MFCC+MMTS
AMTCS
WFEM+C4.5
WFEM+SDA
WFEM+kNN
WFEM+SVM
WFEM+RST
WFEM+LVQ
WFEM+BPN

Fig. 2 The 0.95 confidence interval of saxophone


classification accuracy

Besides, in order to assess and confirm the quality


of the proposed waveform-based features used for
classification, an entropy-based measurement called
symmetrical uncertainty [24] was adopted.
Symmetrical uncertainty method attempts to rank the
worth of a feature by measuring the symmetrical
uncertainty value with respect to the class. The
symmetrical uncertainty value restricts to the range [0, 1]
with the value 1 indicating that knowledge of feature
value completely predicts the class value and the value 0
indicating their independence. In this assessment, there
were totally 195 Mel-frequency cepstral coefficients
(MFCC, 13 features for each of the 15 tones) and 255
waveform-based features (17 features for each of the 15
tones) considered. Table 7 lists the ranking of first 30
important features and their symmetrical uncertainty (SU)
values. The result indicates that 22 out of the fist 30
important features are occupied by our proposed
waveform-based features, and this provides the evidence

In order to verify the effectiveness of our proposal


method, the classification algorithms including k nearest
neighborhood classifier (kNN), support vector machine
(SVM), back-propagation neural network (BPN), learning
vector quantization (LVQ), stepwise discriminant analysis
(SDA), decision tree (C4.5), and rough set theory (RST)
were also employed for the comparison. Also, the first
13 Mel-frequency cepstral coefficients (MFCC) which are
generally acknowledged as the most important feature
scheme for musical sound identification were considered
in the comparison. The 5-fold stratified cross-validation
strategy was applied for these algorithms, and all the 255
waveform shape-based features belonging to 15 tones
were used. For the classification algorithms with feature
selection function including MMTS, SDA, C4.5, and RST,
a tone is considered important if any feature belonging to
the tone is selected. With regard to kNN, SVM, BPN,
5

that our proposed WFEM is comparable.


4. Conclusions
An effective assistance in increasing the reliability
of alto saxophone timbre inspection and discrimination
was achieved by our proposed AMTCS in this study.
Through employing the AMTCS, strong assistance was
provided to the inspection of saxophone timbre quality,
and a perfect identification rate on high quality
saxophones, qualified quality saxophones, and the
saxophones needed to be further adjusted was achieved.
The proposed system not only can help increase the
reliability of saxophone timbre quality judgment but also
can reduce the required effort for discriminating different
quality levels by professional musicians hearing.
Especially, not only the WFEM is simple and the
dimension of extracted features is relatively low but also
MMTS can identify the important features and reduced
the dimension of feature vector, which makes our
proposed AMTCS more efficient. Besides, through the
feature selection function of MMTS, the significant tones
having impact on saxophone timbre quality can be easily
identified. Having the quality information of tones, the
next errand is to find out and improve the root cause
which makes the tones abnormal.
1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

References
M. S. Sanders and E. J. McCormick (1993), Human
Factors in Engineering and Design, 7th Edition,
McGraw-Hill, New York.
S. Ando and K. Yamaguchi (1993), Statistical
Study of Spectral Parameters in Musical Instrument
Tones, Journal of the Acoustical Society of
America, vol. 94(1), pp. 37-45.
J. G. Proakis and D. G. Manolakis (2006), Digital
Signal Processing: Principles, Algorithms, and
Applications, Prentice Hall.
J. C. Brown (1991), Calculation of A Constant Q
Spectral Transform, Journal of the Acoustical
Society of America, vol. 89(1), pp. 425-434.
A. Wieczorkowska (2001), Musical Sound
Classification based on Wavelet Analysis,
Fundamenta Informaticae, 47(1-2), pp. 175-188.
W. J. Pielemeier, G. H. Wakefield, and M. H.
Simoni (1996), Time-frequency analysis of
musical signals, Proceedings of the IEEE, 84(9),
pp. 1216-1230.
J. C. Brown (1999), Computer Identification of
Musical Instruments Using Pattern Recognition
with Cepstral Coefficients as Features, Journal of
the Acoustical Society of Americ, 105(3), pp.
1933-1941.
P. J. Ponce de Len and J. M. Iesta (2007),
Pattern Recognition Approach for Music Style
Identification
Using
Shallow
Statistical
Descriptors, IEEE Transactions on Systems, Man,
and Cybernetics - Part C: Applications and
Reviews, 37(2).
S. Essid, G. Richard, and B. David (2006),
Musical Instrument Recognition by Pairwise
Classification Strategies, IEEE Transactions on
Audio, Speech, and Lngurage Processing, 14(4).
G. D. Poli and P. Prandoni (1997), Sonological

16.

17.

18.

19.

20.

21.

22.
23.

24.

25.
26.

Models for Timbre Characterization, Journal of


New Music Research, 26(2), pp. 170-197.
J. D. Deng, C. Simmermacher, and S. Cranefield
(2008), A Study on Feature Analysis for Musical
Instrument Classification, IEEE Transactions on
Systems, Man, and Cybernetics-Part B:
Cybernetics, 38(2), pp. 429-438.
E. Benetos, M. Kotti, C. Kotropoulos (2006),
Musical Instrument Classification Using
Non-negative Matrix Factorization Algorithms and
Subset Feature Selection, IEEE International
Conference on Acoustics, Speech and Signal
Processing, 5, pp. 221-224.
P. Herrera, A. Yeterian, and F. Gouyon (2002),
Automatic Classification of Drum Sounds: A
Comparison of Feature Selection Methods and
Classification Techniques, Lecture Notes in
Computer Science, 2445, pp. 69-80.
G. Agostini, M. Longari, and E. Pollastri (2003),
Musical Instrument Timbres Classification with
Spectral Features, EURASIP Journal on Applied
Signal Processing, 1, pp. 5-14.
I. Kaminskyj and T. Czaszejko (2005), Automatic
Recognition of Isolated Monophonic Musical
Instrument Sounds using kNNC, Journal of
Intelligent Information Systems, 24(2/3), pp.
199-221.
A. M. Fanelli, G. Castellano, and C. A. Buscicchio
(2000), A Modular Neuro-Fuzzy Network for
Musical Instruments Classification, Lecture Notes
in Computer Science, 1857, pp. 372-382.
B. Kostek (2004), Musical Instrument
Classification and Duet Analysis Employing Music
Information Retrieval Techniques, Proceedings of
the IEEE, 92(4), pp. 712-729.
A. Wieczorkowska and A. Czyzewski (2003),
Rough Set Based Automatic Classification of
Musical Instrument Sounds, Electronic Notes in
Theoretical Computer Science, 82(4), pp. 298-309.
J. C. Brown, O. Houix, and S. McAdams (2001),
Feature Dependence in the Automatic
Identification of Musical Woodwind Instruments,
Journal of the Acoustical Society of America,.
109(3), pp. 1064-1072.
K. W. Berger (1964), Some Factors in the
Recognition of Timbre, Journal of the Acoustical
Society of America, 36(10), pp. 1888-1891.
G. Taguchi and R. Jugulum (2002), The
Mahalanobis-Taguchi Strategy, John Wiley &
Sons, New York.
R. L. Burden and J. D. Faires (2000), Numerical
Analysis, 7th Bk&Cdr Edition, Brooks/Cole.
University of Iowa Musical Instrument Sample
Database,
http://theremin.music.uiowa.edu/index.html
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W.
T. Vetterling (1988), Numerical Recipes in C,
Cambridge, U.K.: Cambridge University Press.
V. N. Vapnik (1998), Statistical Learning Theory,
Wiley, New York, NY.
U. H. G. KreBel (1999), Pairwise Classification
and Support Vector Machines, Advances in Kernel

27.

28.

Methods: Support Vector Learning, pp. 255-268,


The MIT Press, Cambridge, MA.
G. Ou and Y. L. Murphey (2007), Multi-class
Pattern Classification Using Neural Networks,
Pattern Recognition, 40(1), pp. 4-18.
T. G. Dietterich and G. Bakiri (1995), Solving
Multiclass Learning Problem via Error-correcting
Output Codes, Journal of Artificial Intelligence
Research, 2, pp. 263-286.

You might also like