You are on page 1of 6

ISMIR 2008 – Session 1c – Timbre

INSTRUMENT EQUALIZER FOR QUERY-BY-EXAMPLE RETRIEVAL:


IMPROVING SOUND SOURCE SEPARATION BASED ON
INTEGRATED HARMONIC AND INHARMONIC MODELS

Katsutoshi Itoyama† * Masataka Goto‡ Kazunori Komatani†


Tetsuya Ogata† Hiroshi G. Okuno†
† Graduate School of Infomatics, Kyoto University * JSPS Research Fellow (DC1)
‡ National Institute of Advanced Industrial Science and Technology (AIST)
{itoyama,komatani,ogata,okuno}@kuis.kyoto-u.ac.jp m.goto@aist.go.jp

ABSTRACT face. The interface enables a user to boost or cut the volume
of each instrument part of an existing musical piece. With
This paper describes a music remixing interface, called In- this interface, a user can easily give an alternative query with
strument Equalizer, that allows users to control the volume a different mixing balance to obtain refined results of the
of each instrument part within existing audio recordings QBE retrieval. The issue in the above example of finding
in real time. Although query-by-example retrieval systems another piece with weaker vocal or drum sounds can thus
need a user to prepare favorite examples (songs) in general, be resolved. Note that existing graphic equalizers or tone
our interface gives a user to generate examples from exist- controls on the market cannot control each individual instru-
ing ones by cutting or boosting some instrument/vocal parts, ment part in this way: they can adjust only frequency char-
resulting in a variety of retrieved results. To change the vol- acteristics (e.g., boost or cut for bass and treble). Although
ume, all instrument parts are separated from the input sound remixing stereo audio signals [8] had reported previously,
mixture using the corresponding standard MIDI file. For it had tackled to control only harmonic instrument sounds.
the separation, we used an integrated tone (timbre) model Our goal is to control all instrument sounds including both
consisting of harmonic and inharmonic models that are ini- harmonic and inharmonic ones.
tialized with template sounds recorded from a MIDI sound This paper describes our music remixing interface, called
generator. The remaining but critical problem here is to deal Instrument Equalizer, in which a user can listen to and remix
with various performance styles and instrument bodies that a musical piece in real time. It has sliders corresponding
are not given in the template sounds. To solve this problem, to different musical instruments and enables a user to ma-
we train probabilistic distributions of timbre features by us- nipulate the volume of each instrument part in polyphonic
ing various sounds. By adding a new constraint of maxi- audio signals. Since this interface is independent of the suc-
mizing the likelihood of timbre features extracted from each ceeding QBE system, any QBE system can be used. In our
tone model, we succeeded in estimating model parameters current implementation, it leverages the standard MIDI file
that better express actual timbre. (SMF) corresponding to the audio signal of a musical piece
1 INTRODUCTION to separate sound sources. We can assume that it is relatively
One of promising approaches of music information retrieval easy to obtain such SMFs from the web, etc. (especially for
is the query-by-example (QBE) retrieval [1, 2, 3, 4, 5, 6, 7] classical music). Of course, given a SMF, it is quite easy to
where a user can receive the list of musical pieces ranked control the volume of instrument parts during the SMF play-
by their similarity to a musical piece (example) that the user back, and readers might think that we can use it as a query.
gives as a query. Although this approach is powerful and Its sound quality, however, is not good in general and users
useful, a user has to prepare or find favorite examples and would lose their drive to use the QBE retrieval. Moreover,
sometimes feels difficulty to control/change the retrieved we believe it is important to start from an existing favorite
pieces after seeing them because the user has to find another musical piece of high quality and then refine the retrieved
appropriate example to get better results. For example, if a results.
user feels that vocal or drum sounds are too strong in the
2 INSTRUMENT EQUALIZER
retrieved pieces, the user has to find another piece that has
weaker vocal or drum sounds while keeping the basic mood The Instrument Equalizer enables a user to remix existing
and timbre of the piece. It is sometimes very difficult to find polyphonic musical signals. The screenshot of its interface
such a piece within a music collection. is shown in Figure 1 and the overall system is shown in Fig-
We therefore propose yet another way of preparing an ex- ure 2. It has two features for remixing audio mixtures as
ample for the QBE retrieval by using a music remixing inter- follows:

133
ISMIR 2008 – Session 1c – Timbre

Figure 1. Screenshot of main window.


Figure 3. System architecture.

ually improved to represent actual sounds in the sound mix-


ture.
2.1 Internal architectures
This section describes the internal architectures of control-
ling the volume of each instrument part. The procedures
described in this section are performed in real time under
the assumption that the musical signals of each instrument
part already have been obtained in advance from the target
polyphonic musical signal, as described in Section 3. Let
xk (c, t) and yk (c, t) be a separated signal and the volume of
Figure 2. Instrument Equalizing System. instrument k at channel c and time t, respectively. yk (c, t)
satisfies the following condition:
∀k, c, t : 0 ≤ yk (c, t) ≤ 1,
1. Volume control function. It provides the remixing
and yk (c, t) is obtained as
function by boosting or cutting the volume of each
instrument part, not by controlling the gain of a fre- yk (c, t) = (value of volume slider k) · (value of the pan c).
quency band. A user can listen to the remixed sound
mixture as soon as the user manipulates the volume. The overview of the architecture is shown in Figure 3.
2. Interlocking with the hardware controller. In addition 1. Volume control function. The output signal, x(c, t), is
to a typical mouse control on the screen, we allow obtained as
a user to use a hardware controller shown in Figure 
x(c, t) = yk (c, t) · xk (c, t).
2 with multiple faders. It enables the user to manipu-
k
late the volume intuitively and quickly. This hardware
controller makes it easy to manipulate the volume of Each yk (t) is obtained in real-time from the volume
multiple parts at the same time, while it is difficult on sliders in the GUI in Figure 1.
a mouse control. 2. Interlocking with the hardware controller. The GUI
To remix a polyphonic musical signal, the signal must be and the hardware controller communicate by MIDI.
separated into each instrument part. We use an integrated If users control the hardware fader, a MIDI message
weighted mixture model consisting harmonic-structure and which represents the new volume is sent to the GUI,
inharmonic-structure tone models [9] for separating the sig- and vice varsa. Since a motor is embeded in the fader,
nal, but improve the parameter estimation method of this MIDI messages from the GUI move the fader to the
model by introducing better prior distributions. This sep- position corresponding value of the volume.
aration method needs a standard MIDI file (SMF) that is
synchronized to the polyphonic signal. We assume that 3 SOUND SOURCE SEPARATION CONSIDERING
the SMF has already been synchronized with the input TIMBRE VARIETIES
signal by using audio-to-score alignment methods such as In this section, we first define our sound source separation
[10, 11, 12]. For the separation using the integrated model, problem and the integrated model. We then describe tim-
the parameters of the model are initialized by template bre varieties and timbre feature distributions for estimating
sounds recorded from a MIDI sound generator. and grad- parameters of the model.

134
ISMIR 2008 – Session 1c – Timbre

3.1 Integrated model of harmonic and inharmonic mod-


els Table 1. Parameters of the integrated model.
Symbol Description
The sound source separation problem is to decompose the (J)
wkl overall amplitude
input power spectrogram, X(c, t, f ), into the power spec-
rkl (c) relative amplitude of each channel
trogram corresponding to each musical note, where c, t, and (H) (I)
wkl , wkl relative amplitude of harmonic and inharmonic
f are the channel (e.g., left and right), the time, and the fre-
tone models
quency, respectively. We assume that X(c, t, f ) includes K ukl (m) coefficient of the temporal power envelope
musical instruments and the k-th instrument performs Lk vkl (n) relative amplitude of n-th harmonic component
musical notes. We use the tone model, Jkl (c, t, f ), to rep- τkl onset time
resent the power spectrogram of the l-th musical note per- φkl diffusion of a Gaussian of power envelope
formed by the k-th musical instrument ((k, l)-th note), and ωkl (t) F0 trajectory
the power spectrogram of a template sound, Ykl (t, f ), to ini- σkl diffusion of a harmonic component along the
tialize the parameters of Jkl (c, t, f ). Each musical note of freq. axis
the SMF is played back on a MIDI sound generator to record (I)
Ekl (t) power envelope of inharmonic tone model
the corresponding template sound. Ykl (t, f ) is monaural be- (I)
Fkl (t, f ) relative amplitude of frequency f at time t of
cause SMFs may not include any sound localization (chan- inharmonic tone model
nel) information. Ykl (t, f ) is normalized to satisfy the fol-
lowing relation, where C is the total number of the channels:
  where M is the number of Gaussian kernels representing the
X(c, t, f ) dt df = C Ykl (t, f ) dt df.
temporal power envelope and N is the number of Gaussian
c k,l
kernels representing the harmonic components. ukl (m),
(I) (I) (H) (I)
For this source separation, we define an integrated model, vkl (n), Ekl (t), Fkl (t, f ), wkl , and wkl satisfy the fol-
Jkl (c, t, f ), as the sum of harmonic-structure tone models, lowing conditions:
Hkl (t, f ), and inharmonic-structure tone models, Ikl (t, f ),  
(J)
multiplied by the whole amplitude of the model, wkl , and ∀k, l : ukl (m) = 1, ∀k, l : vkl (n) = 1,
m n
the relative amplitude of each channel, rkl (c):
(I) (I)
∀k, l : Ekl (t) dt = 1, ∀k, l, t : Fkl (t, f ) df = 1,
(J)
Jkl (c, t, f ) = wkl rkl (c) Hkl (t, f ) + Ikl (t, f ) ,
(H) (I)
and ∀k, l : wkl + wkl = 1.
(J)
where wkl and rkl (c) satisfy the following constraints:
The goal of this separation is to decompose X(c, t, f )
  into Jkl (c, t, f ) by estimating a spectrogram distribution
(J)
wkl = X(c, t, f ) dt df, ∀k, l : rkl (c) = C. function, Δ(J) (k, l; c, t, f ), which satisfies
k,l c
∀k, l, c, t, f : 0 ≤ Δ(J) (k, l; c, t, f ) ≤ 1 and
All parameters of Jkl (c, t, f ) are listed in Table 1. The 
harmonic model, Hkl (t, f ), is defined as a constrained ∀c, t, f : Δ(J) (k, l; c, t, f ) = 1.
two-dimensional Gaussian mixture model (GMM), which is k,l
 (H)
a product of two one-dimensional GMMs, Ekl (m, t)
 (H) With Δ(J) (k, l; c, t, f ), the separated power spectrogram,
and Fkl (n, t, f ), and is designed by referring to (J)
Xkl (c, t, f ), is obtained as
the harmonic-temporal-structured clustering (HTC) source
model [13]. The inharmonic model, Ikl (t, f ), is defined as (J)
Xkl (c, t, f ) = Δ(J) (k, l; c, t, f )X(c, t, f ).
a product of two nonparametric functions. The definition of
these models is as follows: Furthermore, let Δ(H) (m, n; k, l, t, f ) and Δ(I) (k, l, t, f )
be spectrogram distribution functions which decompose
(H)

M 
N
(H) (H) (J)
Hkl (t, f ) = wkl Ekl (m, t)Fkl (n, t, f ), Xkl (c, t, f ) into each Gaussian distribution of the har-
m=1 n=1 monic model and the inharmonic model, respectively. These
  functions satisfy
(H) ukl (m) (t − τkl − mφkl )2
Ekl (m, t) = √ exp − ,
2πφkl 2φkl2 ∀k, l, m, n, t, f : 0 ≤ Δ(H) (m, n; k, l, t, f ) ≤ 1,
 
(H) vkl (n) (f − nωkl (t))2
Fkl (n, t, f ) = √ exp − , and ∀k, l, t, f : 0 ≤ Δ(I) (k, l, t, f ) ≤ 1, and
2πσkl 2σkl2 
(I) (I) (I) ∀k, l, t, f : Δ(H) (m, n; k, l, t, f ) + Δ(I) (k, l, t, f ) = 1.
Ikl (t, f ) = wkl Ekl (t)Fkl (t, f ), m,n

135
ISMIR 2008 – Session 1c – Timbre

To evaluate the ‘effectiveness’ of this separation, we can 3.3 Cost function without considering timbre feature
use a cost function defined as the Kullback-Leibler (KL) di- distributions
(J)
vergence from Xkl (c, t, f ) to Jkl (c, t, f ): In Itoyama’s previous study [9], they used template sounds
instead of timbre feature distributions to evaluate the ‘good-
(J)
 (J)
(J)
Xkl (c, t, f ) (Y )
ness’ of the feature vector. The cost function, Qkl , used in
Qkl = Xkl (c, t, f ) log dt df.
Jkl (c, t, f ) [9] can be obtained by replacing the negative log-likelihood,
c (p) (Y )
Qkl , with the KL divergence, Qkl , from Ykl (t, f ) (the

(J)
By minimizing the sum of Qkl over (k, l) pertaining to power spectrogram of a template sound) to Jkl (t, f ):
Δ(J) (k, l; c, t, f ), we obtain the spectrogram distribution
(Y )
 (J)
(J)
Xkl (c, t, f )
function and model parameters (i.e., the most ‘effective’ de- Qkl = Xkl (c, t, f ) log dt df
composition). c
Jkl (c, t, f )
(J)
By minimizing the Qkl pertaining to each parameter of  rkl (c)Ykl (t, f )
the integrated model, we obtain model parameters estimated + rkl (c)Ykl (t, f ) log dt df.
c
Jkl (c, t, f )
from the distributed spectrogram. This parameter estimation
is equivalent to a maximum likelihood estimation. The pa- 4 EXPERIMENTAL EVALUATION
rameter update equations are described in appendix A.
We conducted experiments to confirm whether the perfor-
3.2 Timbre varieties within each instrument mance of the source separation using the prior distribution is
Even within the same instrument, different instrument bod- equivalent to the one using the template sounds. In the first
ies have different timbres, although its timbral difference is experiment, we separated the sound mixtures which were
smaller than the difference among different musical instru- generated from a MIDI sound generator. In the other one,
ments. Moreover, in live performances, each musical note the sound mixtures were created from the signals with mul-
could have slightly different timbre according to the perfor- tiple tracks [15] which were before mixdown. In this exper-
mance styles. Instead of preparing a set of many template iment, we compared the following two conditions:
sounds to represent such timbre varieties within each instru-
ment, we represent them by using a probabilistic distribu- 1. using the log-likelihood of timbre feature distribu-
tion. tions (proposed method, section 3.2),
We use parameters of the integrated model, ukl (m), 2. using the template sounds (previous method [9], sec-
(I) tion 3.3).
vkl (n), and Fkl (t, f ), to represent the timbre variety of in-
strument k by training a diagonal Gaussian distribution with
(u) (v) (F ) (u) 4.1 Experimental conditions
mean μk (m), μk (m), μk (f ) and variance Σk (m),
(v) (F ) We used 5 SMFs from the RWC Music Database (RWC-
Σk (n), Σk (f ), respectively. Note that other probability
MDB-P-2001 No. 1, 2, 3, 8, and 10) [16]. We recorded all
distributions, such as a Dirichlet distribution, are available
musical notes of these SMFs by using two different MIDI
in this case. The model parameters for training the prior dis-
sound generators made by different manufacturers. We used
tribution are extracted from instrument sound database [14]
one of them for the test (evaluation) data and the other for
(i.e., the parameters are estimated without any prior distri-
obtaining the template sounds or training the timbre feature
butions).
distributions in advance.
By minimizing the cost function,
The experimental procedure is as follows:

(p)
 (J)
(J)
Xkl (c, t, f ) 1. initialize the integrated model of each musical note by
Qkl = Xkl (c, t, f ) log dt df
Jkl (c, t, f ) using the corresponding template sound,
c
1 (u) (u) 2. estimate all the model parameters from the input
+ (ukl (m) − μk (m))2 /Σk (m) sound mixture, and
2 m
1 (v) (v) 3. calculate the SNR in the frequency domain for the
+ (vkl (n) − μk (n))2 /Σk (n) evaluation.
2 n

1 (I) (F ) (F ) The SNR is defined as follows:
+ (Fkl (t, f ) − μk (f ))2 /Σk (f ) dt df,
2
1 
SNR = SNRkl (c, t) dt,
where the last term is an additional cost by using the prior C(T1 − T0 ) c
distribution, we obtain the parameters by taking into account (J)
the timbre varieties. This parameter estimation is equivalent Xkl (c, t, f )2
SNRkl (c, t) = log10 (J) (R) 2 df,
to a maximum A Posteriori estimation. Xkl (c, t, f ) − Xkl (c, t, f )

136
ISMIR 2008 – Session 1c – Timbre

the better prior distributions.


Table 2. Experimental conditions.
Frequency Sampling rate 44.1 kHz 5 CONCLUSION
analysis Analyzing method STFT In this paper, we have proposed the novel use of a music
STFT window 2048 points Gaussian
remixing interface for generating queries for the QBE re-
STFT shift 441 points
trieval, explained our Instrument Equalizer on the basis of
Parameters C 2
sound source separation using an integrated model consist-
M 10
N 30
ing of harmonic and inharmonic models, and described a
MIDI sound Test data YAMAHA MU-2000 new parameter estimation method for the integrated model
generator Template sounds Roland SD-90 by using the timbre feature distributions. We confirmed that
this method increased separation performance for most in-
strument parts simply by using the basic timbre features.
Table 3. SNRs of the signals separated from sound mixtures Although the use of the timbre feature distributions is
generated from a MIDI tone generator. P1, P2, and P3 are promising, it has not been fully exploited in our experi-
the SNRs which are based on the prior distribution trained ments. For example, we have not tried to use training data
using 1, 2, and 3 instrument bodies, respectively. T is the including various performance styles and instrument bodies.
SNR which is based on the template sounds. We plan to evaluate our method by using such various train-
P1 P2 P3 T ing data as well as more advanced timbre features. Some
P001 11.5 12.1 11.6 14.0 performance benchmark for audio source separation [17]
P002 12.3 12.3 12.5 12.3 will helpful to compare our separation method with other
P003 11.5 11.8 12.4 10.8 ones. Future work will also include the usability evaluation
P008 8.1 7.8 8.3 4.9 of the Instrument Equalizer for the use of the QBE retrieval.
P010 9.1 8.8 8.9 12.2
6 ACKNOWLEDGEMENTS
Ave. 10.5 10.6 10.8 10.8
This research was partially supported by the Ministry of
Education, Science, Sports and Culture, Grant-in-Aid for
Table 4. SNRs of the signals separated from sound mixtures Scientific Research of Priority Areas, Primordial Knowl-
generated from CD recordings. edge Model Core of Global COE program and CrestMuse
P1 P2 P3 T Project.
P001 9.3 9.4 9.4 9.0
7 REFERENCES
P002 11.4 11.5 11.7 11.6
P003 4.0 4.1 4.2 3.1 [1] Rauber, A., Pampalk, E., Merkl, D., “Using Psycho-acoustic
P008 7.1 7.2 7.3 6.1 Models and Self-organizing Maps to Create a Hierarchical
Structuring of Music by Sound Similarity”, Proc. ISMIR, pp.
P010 8.4 8.5 8.5 8.0
71–80, 2002.
Ave. 8.1 8.2 8.3 7.6
[2] Yang, C., “The MACSIS Acoustic Indexing Framework for
Music Retrieval: An Experimental Study”, Proc. ISMIR, pp.
53–62, 2002.
where T0 and T1 are the beginning and ending times of
(R) [3] Allamanche, E., Herre, J., Hellmuth, O., Kastner, T., Ertel, C.,
the input power spectrogram, X(c, t, f ), and Xkl (c, t, f ) “A Multiple Feature Model for Musical Similarity Retrieval”,
is the ground-truth power spectrogram corresponding to the Proc. ISMIR, 2003.
(k, l)-th note (i.e., the spectrogram of an actual sound before
mixing). We used a 40-parameters for the prior distributions [4] Feng, Y., Zhuang, Y., Pan, Y., “Music Information Retrieval by
(1–10 dimensions: ukl (1), . . ., ukl (M ), 11–40 dimensions: Detecting Mood via Computational Media Aesthetics”, Proc.
of Web Intelligence, pp. 235–241, 2003.
vkl (1), . . ., vkl (N )), where M = 10 and N = 30. Other
experimental conditions are shown in Table 2. [5] Thoshkahna, B. and Ramakrishnan, K. R., “Projekt Quebex: A
4.2 Experimental results Query by Example System for Audio Retrieval”, Proc. ICME,
pp. 265–268, 2005.
The results are listed in Tables 3 and 4. In both experiments,
the SNRs were improved by increasing the number of in- [6] Vignoli, F. and Pauws, S., “A Music Retrieval System Based
strument bodies for training the prior distributions. Further- on User-driven Similarity and Its Evaluation”, Proc. ISMIR,
more, the average SNR of P3 is equal to that of T in Table 3 pp. 272–279, 2005.
and the average SNRs in Table 4 is improved from the aver- [7] Kitahara, T., Goto, M., Komatani, K., Ogata, T., Okuno, H.
age SNR of T. This means that although the SNRs decrease G., “Musical Instrument Recognizer “Instrogram” and Its Ap-
in some cases where the timbre difference between template plication to Music Retrieval based on Instrumentation Similar-
sounds and input sounds is large, it can be resolved by using ity”, Proc. ISM, pp. 265–274, 2006.

137
ISMIR 2008 – Session 1c – Timbre

(H) (I)
[8] Woodruff, J., Pardo, B., and Dannenberg, R., “Remixing A.3 wkl , wkl : amplitude of harmonic and inhar-
Stereo Music with Score-informed Source Separation”, Proc. monic tone models
ISMIR, pp. 314–319, 2006.   (H)
(H) c,m,n Xklmn (c, t, f ) dt df
[9] Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, wkl =    
(H) (I)
H.G., “Integration and Adaptation of Harmonic and Inhar- c m,n Xklmn (c, t, f ) + Xkl (c, t, f ) dt df
monic Models for Separating Polyphonic Musical Signals”, and
Proc. ICASSP, pp. 57–60, 2006.   (I)
(I) Xkl (c, t, f ) dt df
[10] Cano, P., Loscos, A., and Bonada, J., “Score-performance wkl =    c  .
(H) (I)
Matching using HMMs”, Proc. ICMC, pp. 441–444, 1999. c m,n Xklmn (c, t, f ) + Xkl (c, t, f ) dt df
[11] Adams, N., Marquez, D., and Wakefield, G., “Iterative Deep-
A.4 ukl (m): coefficient of the temporal power envelope
ening for Melody Alignment and Retrieval”, Proc. ISMIR, pp.   (H) (u)
199–206, 2005. c,n Xklmn (c, t, f ) dt df + μk (m)
ukl (m) =   (H) .
[12] Cont, A., “Realtime Audio to Score Alignment for Polyphonic Xklmn (c, t, f ) dt df + 1
c,m,n
Music Instruments using Sparce Non-negative Constraints and
Hierarchical HMMs”, Proc. ICASSP, pp. 641–644, 2006. A.5 vkl (n): relative amplitude of n-th harmonic com-
ponent
[13] Kameoka, H., Nishimoto, T., Sagayama, S., “Harmonic-   (H) (v)
temporal Structured Clustering via Deterministic Annealing Xklmn (c, t, f ) dt df + μk (m)
vkl (n) = c  (H)
.
EM Algorithm for Audio Feature Extraction”, Proc. ISMIR, Xklmn (c, t, f ) dt df + 1
c,n
pp. 115–122, 2005.
A.6 τkl : onset time
[14] Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R., “RWC   (H)
Music Database: Music Genre Database and Musical Instru- c,m,n (t − mφkl )Xklmn (c, t, f ) dt df
ment Sound Database”, Proc. ISMIR, pp. 229-230, 2003.
τkl =   (H) .
c,m,n Xklmn (c, t, f ) dt df
[15] Goto, M., “AIST Annotation for the RWC Music Database”, A.7 ωkl (t): F0 trajectory
Proc. ISMIR, pp. 359–260, 2006.   (H)
c,m,n nf Xklmn (c, t, f ) df
[16] Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R., ωkl (t) =   (H)
.
“RWC Music Database: Popular, Classical, and Jazz Music c,m,n n2 Xklmn (c, t, f ) df
Databases”, Proc. ISMIR, pp. 287–288, 2002.
A.8 φkl : diffusion of a Gaussian of power envelope
[17] Vincent, E., Gribonbal, R., Févotte, C., “Performance Mea- 
surement in Blind Audio Source Separation”, IEEE Trans. on (φ) (φ)2
−A1 + A1 + 4A2 A0
(φ) (φ)
ASLP, vol. 14, No. 4, pp. 1462–1469, 2006. φkl = , where
(φ)
2A1
A DERIVATION OF THE PARAMETER UPDATE (φ)
 (H)
EQUATION A2 = Xklmn (c, t, f ) dt df,
c,m,n
In this section, we describe the update equations of each
(φ)
 (H)
parameter derived from the M-step of the EM algorithm. A1 = m(t − τkl )Xklmn (c, t, f ) dt df, and
By differentiating the cost function about each parameter, c,m,n
(H)
the update equations were obtained. Let Xklmn (c, t, f ) and (φ)
 (H)
(I) A0 = (t − τkl )2 Xklmn (c, t, f ) dt df.
Xkl (c, t, f ) be the decomposed power: c,m,n

(H) (J)
Xklmn (c, t, f ) = Δ(H) (m, n; k, l, t, f )Xkl (c, t, f ) A.9 σkl : diffusion of harmonic component along the
frequency axis
(I) (J)
Xkl (c, t, f ) = Δ(I) (k, l, t, f )Xkl (c, t, f ).  
and 
 c,m,n (f − nωkl (t))2 Xklmn (H)
(c, t, f ) dt df
(J )
σkl =    (H) .
A.1 wkl : overall amplitude c,m,n Xklmn (c, t, f ) dt df
 
(J)
  (H) (I)
(I) (I)
A.10 Ekl (t), Fkl (t, f ): inharmonic tone model
wkl = Xklmn (c, t, f ) + Xkl (c, t, f ) dt df.
c m,n   (I)
(I) c Xkl (c, t, f ) df
A.2 rkl (c): relative amplitude of each channel Ekl (t) =   (I)
and
Xkl (c, t, f ) dt df
  (H) (I)
 c
 (I) (F )
C m,n Xklmn (c, t, f ) + Xkl (c, t, f ) dt df (I) X (c, t, f ) + μk (f )
rkl (c) =     . Fkl (t, f ) = c kl (I) .
(H) (I)
c m,n Xklmn (c, t, f ) + Xkl (c, t, f ) dt df c Xkl (c, t, f ) df + 1

138

You might also like