Professional Documents
Culture Documents
Yu-Hsiang Hsiao
Department of Industrial Engineering and Engineering Management
National Tsing Hua University
Hsinchu 300, Taiwan
e-mail: Adrian.iem88@nctu.edu.tw
to differentiate different quality levels of saxophones. It
is worthy to notice that the pitch can be easily checked
using an apparatus while the timbre quality mainly
depends on the musicians hearing judgment. However,
the sensitivity and the stability of human perception can
be influenced by many factors, such as emotions
(psychological) and tiredness (physiological) [1], which
may lead to decrease the reliability of timbre quality
inspection. Actually, the saxophone timbre quality
judgment task implemented by the professional musicians
in the final inspection stage can be regarded as a musical
instrument timbre classification problem.
Generally, the framework of a musical instrument
timbre classification task involves two phases:
parameterization phase and classification phase.
Because the sound signals presented as raw waveform
cannot be directly processed by a classification algorithm,
the parameterization phase attempts to extract a wide set
of quantified features from the sound signals to
appropriately model their temporal and spectral
characteristics. The parameterization can be achieved
via various signal analysis techniques considering the time,
frequency or time-frequency domain [2]-[6]. Much
work has been done in order to identify the important
perceptual features for timbre recognition [8], [10]-[12],
[19], [20]. After the features characterizing original
sound signals are extracted in parameterization phase,
classification algorithms coming from statistics, soft
computing, or machine learning domain [13]-[16], [18]
are then employed to accomplish the musical instrument
timbre classification or recognition tasks.
In this study, the multi-class Mahalanobis-Taguchi
system (MMTS) was proposed for multi-class
classification and feature selection. MMTS breaks the
limitation of MTS in which only one Mahalanobis space
is constructed for one problem, and establishes an
individual Mahalanobis space for each class to
simultaneously accomplish the multi-class classification
and feature selection tasks.
Also, an automatic
multi-class timbre classification system (AMTCS) was
established in order to increase the accuracy and reliability
of timbre quality inspection of saxophone manufacture
and prevent from the judgment bias caused by human
perception. The AMTCS composes of our proposed
waveform shape-based feature extraction method
(WFEM) in parameterization phase and MMTS in
classification phase. From different point of view, the
WFEM attempts to parameterize the sound signals by
directly catch the shape properties of signal waveform
instead of summarizing the signal behavior in time or
frequency domain, which has been done in most research
work. Finally, a real case about the inspection of
saxophone timbre quality using the verified AMTCS was
presented.
Abstract
An novel multi-class classification and feature selection
algorithm called multi-class Mahalanobis-Taguchi System
(MMTS) is propposed in this study. Also, in order to
improve the reliability of saxophone timbre quality
inspection, an automatic multi-class timbre classification
system (AMTCS) is developed. The AMTCS composes
of our proposed waveform shape-based feature extraction
method (WFEM) in parameterization phase and MMTS
in classification phase. Through employing the AMTCS,
strong assistance was provided to the inspection of
saxophone timbre quality, and a perfect identification rate
on the saxophones with different timbre quality levels is
achieved.
Keywords: Classigication, Feature selection, Feature
extraction, Mahalanobis-Taguchi System, Timbre.
1. Introduction
Mahalanobis-Taguchi system (MTS) developed by
Dr. Taguchi is a collection of methods that was proposed
as a forecasting, classification, and feature selection
technique using multivariate data [21]. So far, MTS has
been successfully used in various applications.
Nevertheless, all of these applications are restricted to
two-class problems due to the limitation of MTS
algorithm that only one Mahalanobis space can be
constructed for one problem. In order to enhance the
practicality in real world, effectively extending MTS
algorithm to support multi-class problems is non-trivial.
Currently, the most popular method of extending a
two-class classification algorithm to a multi-class one is to
decompose a multi-class problem into a collection of
two-class problems [25]-[28].
Although the
decomposition strategies have been extensively utilized
because of their uncomplicated concept and usefulness,
they are not without limitations.
Mostly, the
decomposition strategies dont have the capability of
supporting a binary algorithm for implementing the
feature selection on whole multi-class problem even if the
algorithm can do that well in a two-class problem.
Therefore, in order to effectively extend MTS to support
the multi-class classification problems and at the same
time ensure a well carried out feature selection, it is worth
developing a new MTS algorithm that can simultaneously
handle more than two classes.
Besides, under the highly developed automation
today, the manufacture of saxophone is still a
non-automatic process and much relies on highly skilled
technicians.
It is possible that some unobvious
imprecision that has impact on the sound quality may be
caused during handmade manufacture. Thus, sound
quality is tested in the final inspection by professional
musicians to insure that the timbre quality and other sound
elements are within acceptable specifications, and further
1
1, if [( y[e] AL ) ( y[ e + 1] AL )] 0
0, if [( y[e] AL ) ( y[e + 1] AL )] > 0
L ( e) =
Energy = ( y[e]) 2
CAL
(7)
CA0
Step 5: Derive the Existence Property on Each
Amplitude Level
Existence property on an amplitude level is
recorded by calculating the level-integral value of
waveform over the range from the first sample to the final
sample in a frame on the basis of each amplitude level.
Because the sound is digitally recorded, the waveform is
formed by linearly connecting the adjacent samples.
Thus, the integral of waveform can be perfectly
approximated by Trapezoid Rule [22] which is a method
to approximately calculate the integral of a function over a
definite range. The Trapezoid Rule approximates the
integral of a function by fitting several trapezoids to the
region under the function, and the overall area of the
trapezoids is calculated to be the approximate integral
value. Let WL ( y ) be the AL -centered waveform
function. By the Trapezoid Rule with the width of each
trapezoid equal to 1, the level-integral value of the WL ( y )
over whole frame range, i.e. IN L , can be computed
using (8).
VAL =
(1)
e =1
(2)
e =1, 2 , K, E
e=1
( y[e] AL ) + ( y[e + 1] AL )
2
(8)
E 1
IN L = WL ( y ) dy =
CAL = L (e)
(6)
Where
E 1
IN ymax =
e =1
(5)
e=1
where
2
2.2 Classification
Phase:
Multi-class
Mahalanobis-Taguchi System
To be able to surmount the restriction on the
suitability of using MTS against multi-class problems, the
MMTS was proposed according to the framework of
MTS and is composed of four main implementing stages.
In this study, the MMTS was employed as the main
algorithm in the classification phase of musical instrument
timbre classification task.
The four phases of
implementing MMTS are detailed as follows.
Stage 1: Construction of A Full Model Measurement
Scale with the Mahalanobis Space of each Class as the
Reference.
In this stage, the problem and all related features are
defined, and the representative examples are collected to
construct the individual Mahalanobis space for each class
and establish the full model measurement scale. In order
to enhance the accuracy of constructing the measurement
scale, Gram-Schmidt orthogonalization process [21] is
applied to eliminate the multicollinearity among features
that makes the covariance matrix almost singular and the
inverse matrix invalid.
For each of the k class, we define the examples sampled
from its population as normal while the examples
coming form the other k 1 classes is defined as
abnormal. The Mahalanobis space MS i can be
regarded as a database containing three components of
class Ci : the feature mean vector, the feature standard
l 1
(11)
q =1
where
Al(( ip) )
is the l th
feature vector of MS i
1
( p i )
K
1 ni MD j ( i )
= 10 log 10
(
p
=
i
)
= ip
n
MD
i j =1
i , p =1
j(i )
i , p =1
MS p
q = 1, 2, K , l 1 :
tlq ( p ) =
th
MD (j (pi )i ) is the Mahalanobis distance from the j
( p)
q( p )
(12)
(15)
Gainl = SN l SN l
U q ( p ) U q( (pp))
MDr( i ) =
where d
(14)
( p)
( p )T
l( p )
( p) T
for p = i .
( p)
q( i )
(i )
1 d u rq
d q =1 q ( i ) 2
(13)
wl
lR
url( i )
(16)
(i )
lR
0.8
+ y max = 0.7344
A4
0.6
A3
0.4
A2
0.2
A1
A0
A1
-0.2
A2
-0.4
A3
A4
-0.6
3. Case Study
Houli, Taiwan is one of the main production places of
saxophones in the world. At present, 16 out of 25
Taiwanese musical instrument manufacturers are located
in Houli, and the total employees of the whole industry are
around 1500 people. In addition to develop the OBM
products, most manufacturers in Houli have been
authorized to produce musical instruments by many
global famous musical instrument companies (OEM) for a
long time. According to the statistics from Ministry of
Economic Affairs, Taiwan, around one third of the
saxophones in the world are made in Houli, and the annul
output value reaches 10 hundred million New Taiwan
dollars in 2006, which is around 50% of the total value in
the world.
The real case studied here comes from a saxophone
manufacturer located in Houli, Taiwan. Alto saxophone
is one of main products of this manufacturer. At present,
the finished alto saxophones are grouped into three quality
levels according to their timbre which is subjectively
judged by the professional musicians in the final
inspection stage of the manufacturing procedure. The
three quality levels are high quality, qualified quality, and
adjustment needed.
However, the efficiency of
inspection process is always low because the professional
musicians need to periodically take a rest to prevent from
the hearing tiredness. Moreover, a wrong judgment of
the saxophone quality may lead to huge loss of
-0.8
y max = 0.7344
0
200
400
600
800
1000
Classification
Accuracy
Need
Adjustment
Qualified
Quality
High Quality
Overall
(%)
100.00
100.00
100.00
100.00
Tone
Fold
Feature
Selection
1
2
3
4
5
Overall
c1 d1 e1 f1 g1 a1 b1 c2 d2 e2 f2 g2 a2 b2 c3
Tone
Reduction
AMTCS
WFEM+SVM
WFEM+C4.5
WFEM+RST
WFEM+SDA
WFEM+LVQ
WFEM+kNN
WFEM+BPN
MFCC+MMTS
52%
79%
75%
4%
52%
Accuracy Rate
Classification
System
Need
Adjustment
Qualified
Quality
High
Quality
Overall
100.00
98.57
88.73
94.57
100.00
98.00
100.00
98.18
100.00
100.00
100.00
98.00
97.78
100.00
98.33
100.00
100.00
100.00
100.00
100.00
90.26
96.57
100.00
100
100.00
97.50
100.00
100.00
99.52
92.31
96.31
100.00
98.78
100.00
98.56
100.00
1.06
1.04
1.02
1.00
0.98
0.96
0.94
0.92
0.90
0.88
0.86
MFCC+MMTS
AMTCS
WFEM+C4.5
WFEM+SDA
WFEM+kNN
WFEM+SVM
WFEM+RST
WFEM+LVQ
WFEM+BPN
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
References
M. S. Sanders and E. J. McCormick (1993), Human
Factors in Engineering and Design, 7th Edition,
McGraw-Hill, New York.
S. Ando and K. Yamaguchi (1993), Statistical
Study of Spectral Parameters in Musical Instrument
Tones, Journal of the Acoustical Society of
America, vol. 94(1), pp. 37-45.
J. G. Proakis and D. G. Manolakis (2006), Digital
Signal Processing: Principles, Algorithms, and
Applications, Prentice Hall.
J. C. Brown (1991), Calculation of A Constant Q
Spectral Transform, Journal of the Acoustical
Society of America, vol. 89(1), pp. 425-434.
A. Wieczorkowska (2001), Musical Sound
Classification based on Wavelet Analysis,
Fundamenta Informaticae, 47(1-2), pp. 175-188.
W. J. Pielemeier, G. H. Wakefield, and M. H.
Simoni (1996), Time-frequency analysis of
musical signals, Proceedings of the IEEE, 84(9),
pp. 1216-1230.
J. C. Brown (1999), Computer Identification of
Musical Instruments Using Pattern Recognition
with Cepstral Coefficients as Features, Journal of
the Acoustical Society of Americ, 105(3), pp.
1933-1941.
P. J. Ponce de Len and J. M. Iesta (2007),
Pattern Recognition Approach for Music Style
Identification
Using
Shallow
Statistical
Descriptors, IEEE Transactions on Systems, Man,
and Cybernetics - Part C: Applications and
Reviews, 37(2).
S. Essid, G. Richard, and B. David (2006),
Musical Instrument Recognition by Pairwise
Classification Strategies, IEEE Transactions on
Audio, Speech, and Lngurage Processing, 14(4).
G. D. Poli and P. Prandoni (1997), Sonological
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.