You are on page 1of 12

Axel Rbel

Institut de Recherche et Coordination


Frequency-Slope Estimation
Acoustique / Musique
1 Place Igor-Stravinsky
and Its Application to
75004 Paris, France
axel.roebel@ircam.fr Parameter Estimation for
Non-Stationary Sinusoids

Sinusoidal models are often used for the representa- likelihood estimate (MLE) generally assume that
tion, analysis, or transformation of music or speech the amplitude of the sinusoids is constant. As an
signals (Quatieri and McAulay 1986; Amatriain example, we refer to an algorithm that is based on
et al. 2002.). An important step that is necessary for signal demodulation employing an initial search
obtaining the sinusoidal model lies in estimating over a grid of frequencies and frequency slopes and
the amplitudes, frequencies, and phases of the a final fine-tuning of the parameters using an
sinusoids from the peaks of the Discrete Fourier iterative maximization of the amplitude of the
Transform (DFT). The estimation is rather simple demodulated signal (Abatzoglou 1986). Similar to
provided the signal is stationary. A standard method multi-component signals with stationary sinusoids,
for this estimation is the quadratically interpolated the MLE of sinusoidal parameters for multi-
Fast Fourier Transform (QIFFT) estimator (Abe and component signals with frequency-modulated (FM)
Smith 2005). The QIFFT estimator uses the bin at sinusoids is rather costly, because a highly nonlin-
the maximum of each spectral peak together with ear and high-dimensional cost function must be
its two neighbors to establish a second-order poly- maximized (Saha and Kay 2002). Owing to the
nomial model of the log amplitude and unwrapped computational savings and despite the fact that
phase of the peak. The amplitude and frequency windowing reduces the estimator efficiency (Offelli
estimates of the sinusoid that is related to the and Petri 1992), the windowing technique is gener-
spectral peak are then derived from the height and ally preferred if the signal contains more than a
frequency position of the maximum of the polyno- single sinusoid.
mial. The evaluation of the phase polynomial at the Most of the algorithms that employ analysis
frequency position provides the estimate of the windows for the parameter analysis of amplitude-
phase of the sinusoid. modulated (AM) and / or FM sinusoids rely on the
For non-stationary sinusoids, the parameter esti- fact that the analysis window is approximately
mation becomes more difficult, because the QIFFT Gaussian, such that a mathematical investigation
algorithm is severely biased whenever the fre- becomes tractable. Marques and Almeida (1986)
quency is not constant. The term bias refers to the developed this approach for sinusoids with linear
systematic estimation error, that is, the error of the FM and constant amplitude, and Peeters and Rodet
estimator that exists even if no measurement noise (1999) extended it to sinusoids with linear FM and
is present. For the partials in natural vibrato signals, AM. Abe and Smith (2005) presented a version for
the estimation bias of the QIFFT estimator accounts sinusoids with linear FM and exponential AM. The
for a significant amount of residual energy (i.e., the method presented in Abe and Smith 2005 is special
energy remaining after subtracting the sinusoidal in that it tries to extend its range to other analysis
model from the original signal). This is the major windows by means of a set of linear bias-correction
reason for the perceived voiced energy in the resid- functions. The resulting estimator is computation-
ual of vibrato signals. ally efficient and achieves small bias for standard
A number of algorithms with low estimation bias windows as long as the zero-padding factor is
for non-stationary sinusoids have been proposed. sufficiently large (i.e., greater than three) and the
Algorithms that try to implement a maximum modulation rates are relatively small.
In this article, we present a bias-correction
Computer Music Journal, 32:2, pp. 6879, Summer 2008 scheme for sinusoidal parameter estimation of
2008 Massachusetts Institute of Technology. sinusoids with linear AM / FM modulation. As a first

68 Computer Music Journal


step, we provide a mathematical foundation for the kernels is presented. We experimentally compare
conjecture that linear amplitude modulation does the new estimator with its previous version and the
not create any additional bias for the QIFFT estima- algorithm presented recently by Abe and Smith
tor. With respect to bias reduction, we can therefore (2005) as well as the algorithm proposed by Peeters
ignore the amplitude modulation of the signal. and Rodet (1999) using synthetic signals as well as a
Then we extend an initial version of our bias- real-world vibrato signal.
reduction method that has been proposed originally This article is organized as follows. First, we show
in Rbel (2006). The basic ideas of the algorithm are how the bias of the standard estimators is related to
similar to those in Abatzoglou (1986) in that the the frequency slope. Second, we describe the de-
algorithm is based on signal demodulation and modulation scheme and the improved frequency-
maximization of the amplitude of the demodulated slope estimator. Then, we present experimental
signal to find the sinusoidal parameters. In contrast results for the frequency-slope estimation algorithm
to Abatzoglou, however, the present algorithm as well as for the bias-reduction scheme by compar-
allows an analysis window, and the demodulation is ing the results of different algorithms. Furthermore,
obtained directly in the frequency domain. As a we compare different bias-reduction methods by
result, it can be applied if the signal contains more examining the residual energy of the sinusoidal
than a single sinusoid. Moreover, the initial two- model of a real-world vibrato signal. We conclude
dimensional grid search of the algorithm used by with an outlook on further improvements.
Abatzoglou is avoided because first, a simple and
efficient initial estimate of the frequency slope
estimate is used, and second, the frequency and Estimation Bias
frequency slope estimation have been decoupled.
After demodulating the frequency slope, the The signal model used in the following assumes a
standard QIFFT estimator is applied to obtain an linear evolution for amplitude and frequency trajec-
estimate of the sinusoidal parameters. Owing to the tories. Accordingly, a complex discrete-time sinu-
fact that the QIFFT estimator has small bias for soid can be represented as
constant frequency sinusoids, the resulting estimate 2
s(n) = (A + an)e i(!+ 2"# 0 n+ "Dn ) (1)
is significantly improved. The results described in
Rbel (2006) suggest that the demodulation of Here, A is the mean amplitude of the signal, a is
individual sinusoidal components by means of the amplitude slope, is the phase of the sinusoid
spectral deconvolution using only the observable at time n = 0, 0 is its mean frequency, i = 1, and
part of the spectral peak to be analyzed and a D is the frequency slope. Note that all frequency
properly selected and scaled demodulation kernel values are normalized with respect to the sample
creates only a small amount of additional bias in rate. The center of the analysis window is located at
the QIFFT estimator. time 0 such that an ideal estimator should provide
The version of the algorithm presented here is a (A,0,) as estimates for amplitude, frequency, and
refined version of the original demodulation algo- phase. The model in Equation 1 is necessarily
rithm. The enhancements include a new procedure time-limited because we assume A + an >0 for all
to improve the initial estimate of the frequency sample positions n that are used in a signal analysis.
slope reducing the remaining bias for large fre- As an introduction into the problem, we will
quency slopes. Furthermore, the constraint to use summarize the sources of bias that are known to
the same analysis window for the signal spectrum exist for the standard QIFFT estimator and discuss
and the demodulation kernel has been removed. their implications in the context of parameter
Accordingly, it becomes possible to trade bias estimation for sinusoids with linear AM / FM. The
against noise sensitivity. A computationally effi- first source of bias results from the use of a second-
cient implementation of the algorithm using pre- order model for interpolating the spectral bins. For
computed and linearly interpolated demodulation all but an infinitely long Gaussian window, the

Rbel 69
amplitudes of the spectral bins do not follow a tude slope a. Then, we investigate into the proper-
second-order polynomial. Accordingly, the interpo- ties of the spectra of the individual parts and use the
lation is already systematically incorrect for sta- linearity of the Fourier transform to draw conclu-
tionary sinusoids, and therefore we will not discuss sions for the complete spectrum. We first write the
this source of bias here. Nevertheless, as will be- DFT of the signal in Equation 1 using a normalized
come clear later, it is important to reduce this type analysis window W(n) with nW(n) = 1:
of bias as much as possible. This can be achieved by

W(n)(A + an)e i(!+ 2"# 0n+ "Dn )e i(2"#n)


2
means of zero padding the analysis window or, as S(#) = (2)
n=
demonstrated recently, by means of simple bias
correction functions (Abe and Smith 2004). Assuming the analysis window to have even
Second, a cross-component bias results from symmetry, we can use the symmetry relations and
other sinusoidal components. Windowing is gener- remove all parts of the sum in Equation 2 that have
ally used to reduce this bias. The analysis window odd symmetry in n. As a result, the DFT in Equa-
reduces the sidelobes of the sinusoidal components tion 2 simplifies into
such that the cross-component bias of distant sinu-
S(#) = Sc(#) + Sl (#)
soidal components can be effectively reduced. Note,
however, that the reduction of the sidelobe ampli- with
tudes is always accompanied by an increased main
Sc(#) = Ae i! W(n)cos(2"(# 0 #)n)e i"Dn
2
lobe width. Therefore, the windowing technique (3)
n=
will slightly increase the cross-component bias for
nearby components. Moreover, owing to the taper-
Sl(#) = ae i! W(n)ni sin(2"(# 0 #)n)e i"Dn
2
ing of the signal at the frame borders, the noise (4)
n=
sensitivity of the parameter estimation is slightly
increased. In the following, we will assume that Here Sc represents the spectrum of the constant-
the sinusoidal components are resolved such that amplitude part, and Sl represents the spectrum of
the frequency distance between two sinusoids is the linear-amplitude part of the sinusoid.
always larger than the width of the main lobe of For the discussion of Equations 3 and 4, we
both components. In this case, the cross-component assume the coordinate system of the amplitude and
bias will stay nearly the same for stationary and phase spectra to be shifted using the translation ' =
non-stationary components such that the cross- 0. Accordingly, the frequency origin of ' is
component bias will only change marginally with located at the sinusoidal frequency 0. For D = 0, the
the modulation of the sinusoids. amplitude of Sc(') and Sl(') are even functions
A third source of bias is caused by the non- having a local maximum respectively minimum at
stationary parameters. This bias has been analyzed the origin. The amplitude of Sl(') is zero at the
mathematically for the sinusoidal model in Equa- origin. The phase of Sc(') is constant with value
tion 1 and a Gaussian analysis window in Peeters within the main lobe. The phase of Sl(') is odd. It
and Rodet (1999). The result shows that the QIFFT consists of two constant parts (with value /2)
algorithm suffers from additional bias owing to with a phase jump of at the origin. The sum of
parameter variation only if the frequency slope Sc(') and Sl(') has even amplitude and strictly
D 0. In this case, the estimation of all three basic increasing or decreasing phase with the value Aei
parameters is biased, and the bias increases with the at the origin. Depending on the ratio of A and a, the
absolute value of D. spectrum may present either a local maximum or
To study the dependency of the estimation bias minimum at the origin. Because A + an in the
on the frequency slope for arbitrary analysis win- sinusoidal model in Equation 1 is constraint to be
dows, we split the sinusoidal model in Equation 1 positive, the resulting spectrum has a maximum for
into two parts: a sinusoid with constant amplitude all common analysis windows.
A and a sinusoid with mean amplitude 0 and ampli- Note that, because for all parameters a the sum of

70 Computer Music Journal


the two spectra keeps its maximum at the origin Demodulation
and because the phase at the origin does not depend
on the value of a, the QIFFT estimator will provide The main objective of the present algorithm is to
unbiased estimates for amplitude, frequency, and provide a means to demodulate the sinusoid using
phase. As our first result, we can conclude that for only the part of the spectral peak that is accessible
D = 0, the QIFFT estimator provides results that are for analysis. Because the sinusoidal component is
biased only by the first two sources of bias men- contaminated by noise this part will generally be
tioned previously and that the time-varying ampli- the part of the main lobe exceeding the noise level.
tude a 0 does not add any additional bias. Initially, we assume we are given a frequency slope
For D 0, the factor eiDn adds an even phase to
2
estimate D = D for a peak that is part of a signal
the elements of the sum. As a result, the magni- spectrum.
tudes of Sc(') and Sl(') keep all the characteristics In the time domain, the demodulation can be
discussed previously, notably even symmetry and achieved simply by multiplication with a demodu-
extrema values (maxima and minima). The un- lator signal y(n) = eiDn . Multiplication of the
2

wrapped phase spectra, however, are no longer demodulator signal with the input signal in Equa-
piecewise constant. Both phase spectra have an tion 1 will remove the frequency slope and keep all
additional even phase function superimposed. The other parameters unchanged such that the QIFFT
phase offset of Sc(') does not vanish at the origin, algorithm can be applied without additional bias.
and by consequence, the phase is biased already for However, because other sinusoids may be present in
a = 0. For a 0, the even-symmetric phase offset the signal, we cannot apply time-domain demodula-
that is applied to Sl(') will destroy the even sym- tion directly.
metry of the magnitude of S(') such that the peak The demodulation algorithm that uses only the
maximum moves away from the origin, and there- observed part of the spectral peak to approximately
fore the amplitude and frequency estimates of the demodulate the sinusoidal component is described
QIFFT estimator are no longer correct. Accordingly, here in the frequency domain. Assume S(k) is the
the QIFFT estimator suffers from additional bias N-point DFT of the sinusoid to be analyzed and Y(k)
quite similar as has been shown for the Gaussian is the DFT of the demodulator signal. All DFT
window in Peeters and Rodet (1999). spectra are calculated such that the origin of the
DFT basis functions is in the center of the analysis
window. The signal analysis window is ws(n), and
Reducing the Bias the demodulator signal is windowed using wy(n). To
obtain the demodulated sinusoid spectrum X(k), we
In the previous section, we saw that the source of would need to compute the circular convolution
the bias of the QIFFT estimator is the frequency
S(k) Y(k)
slope of the sinusoid. A conceptually simple ap- X(k) = C (5)
proach to estimate the parameters (A,,) of a sinu- N
soid related to a spectral peak requires two steps: where C is a normalization factor taking into
first estimate the frequency slope, then demodulate account windowing effects. As a result of this
the sinusoid and use the QIFFT estimator to find operation, we obtain the spectrum of the product
the sinusoidal parameters. of the demodulator and sinusoidal component
Note that this approach is in principle equivalent windowed by the product window wy(n)ws(n).
to the MLE for constant-amplitude linear FM sig- Therefore, proper normalization would be achieved
nals described in Abatzoglou (1986). Because the by means of setting C = 1/nwy(n)ws(n).
demodulation technique is used for the frequency- Because only part of the sinusoid spectrum is
slope estimation, we first discuss the frequency- available, the normalization factor should be
domain demodulation algorithm. In the subsequent adapted. Assume the peak under investigation is
section, the frequency-slope estimation is described. denoted by P(k). P(k) is part of the spectrum S(k),

Rbel 71
and it covers B bins. To be able to take into account approximately the same value creates the smallest
the impact of the missing part of the spectrum, we bias. Besides the fact that this method achieves
create a spectral model of the observed sinusoid perfect compensation for a = 0, there is a second
assuming the initial slope estimate D is correct: advantage of this method that is related to the
2"i
impact of the background noise. Assuming the
kn
Pm(k) = w s(n)e i"Dn e
2
N (6) background noise energy is locally constant and
n
understanding the maximum border amplitude of
We then select a subset Pm(k) of B bins around the the peak as a very rough indicator of the background
center frequency k = 0. (Note that in the case that B noise level, we can conclude that cutting the peak
is even, the resulting model is not symmetric.) The at its maximum border level could be beneficial,
required normalization factor can now be approxi- because it avoids the parts of the signal where the
mately estimated as background noise is dominant.
1 A final point to note here is that, for parameter
C = (7) estimation from demodulated peaks with the QIFFT
max k(| Pm (k) Y(k)|)
estimator, it is essential to use the bias-correction
Accordingly, if we replace S(k) in Equation 5 by P(k), functions proposed in Abe and Smith (2004) with
we should demodulate using the corrected normal- correction factors adapted to the effective window
ization factor C'. wy(n)ws(n).
The correction factor will be more precise (i.e., Our experimental investigation shows that the
lower bias) for demodulator windows that concen- spectra of the demodulation kernels Y(k) and the
trate more energy in the B-bin-wide band around related observed peak models Pm(k) can be pre-
frequency 0 of the spectrum. This calls for calculated for a fixed grid of frequency slopes and
higher-order windows with low sidelobes. The then linearly interpolated to obtain an approximate
demodulator window, however, will be applied to spectral peak for any given slope. If the length of the
the signal such that, according to Offelli and Petri analysis windows is M, a frequency slope grid with
(1992), the noise sensitivity of the analysis is step size 0.025/M 2 is sufficient to produce estimates
increased. This calls for low-order windows with that are nearly indistinguishable from the results
larger sidelobes. Accordingly, the demodulator produced with the non-interpolated kernels. To use
window allows a trade-off between noise sensitivity the complete information that is available in the
and bias. The experimental investigation suggests observed peak, we use deconvolution kernels of
that the use of the Hanning window as demodulator length 2B + 1 centered around the maximum of the
window wy is a favorable choice for all analysis deconvolution spectrum.
windows ws. The deconvolution can be implemented in the
The compensation of the normalization factor frequency domain as described or in the time
assumes that the amplitude slope a = 0 and that the domain. Time-domain implementation is probably
peak model is cut symmetrically with respect to the more efficient if at least the demodulation kernel
peak center. To achieve a good match between the could be directly stored in the time domain. The
normalization factor and the missing part of the possibilities of time-domain interpolation of the
spectrum of the sinusoidal component that creates demodulation kernels have not yet been studied; we
the peak P(k), the peak that is extracted from the believe, however, that time-domain interpolation
spectrum should be as close as possible to the peak would require on-the-fly generation of the complex
model that is used to derive the compensation kernels from interpolated phase functions. Owing
factor. A number of strategies to extract the ob- to the linearly modulated frequencies of the demod-
served peak from the spectrum have been com- ulation kernels, this would most likely be less
pared. Experimentally, we found that cutting the efficient than the frequency-domain implementa-
peak such that its left and right magnitude have tion described herein.

72 Computer Music Journal


Frequency Slope Estimation and a sinusoid with appropriately modified ampli-
tude evolution. Because the desired frequency esti-
As mentioned previously, the maximum likelihood mate does not change with the amplitude evolution
(ML) frequency slope estimator for constant- of the sinusoid and because the estimator in Equa-
amplitude, linear FM sinusoids maximizes the am- tion 7 appears to be rather insensitive to small
plitude of a demodulated peak (Abatzoglou 1986). changes of the amplitude evolution of the sinusoid,
Accordingly, the maximization of the amplitude of it is considered an approximate estimator for the
the demodulated peak using the demodulation frequency slope for arbitrary analysis windows.
algorithm described above can be considered as an The free parameter to select is the frequency
approximate MLE provided the amplitude slope is slope offset Do. In general, a polynomial approxima-
sufficiently small. tion improves when the approximation range is
To avoid searching a large grid of frequency decreased. This would call for a small Do. In the
slopes, we propose to use an approximate initial present case, however, the relationship between
estimate of the frequency slope D and then to use demodulation slope and amplitude of the demodu-
the frequency slope estimate and two slopes with D lated peak is covered by measurement noise (owing
Do to create three different demodulations of the to estimation errors of the partially observed
observed peak. From the amplitudes of these de- sinusoidal spectrum, the sampling of the Fourier
modulated peaks, a second-order polynomial model spectrum by the DFT, and the amplitude of the
of the relationship between frequency slope and demodulated peak) such that a larger value of Do
demodulated amplitude can be derived. The maxi- might be beneficial. The selection of the Do param-
mum of this polynomial is expected to provide a eter will be discussed further in light of the experi-
refined estimate of the frequency slope. mental results.
The open question we must address involves how The precision of the frequency-slope estimate
best to obtain an approximate estimate of the fre- obtained from the maximum of the polynomial is
quency slope. Given the highest-order coefficients slightly but consistently improved if the polyno-
and A of the QIFFT polynomial for amplitude A mial model is not constructed for the demodulated
and phase of the peak under investigation, the amplitudes A but rather for A /C', where C' is the
i i i i
frequency-slope estimate for a Gaussian analysis normalization factor from Equation 7. Up to now, a
window is theoretical explanation of this experimental finding
has not yet been obtained. Using C ' to calculate
= $!
D (8)
$!2 + $ 2A the polynomial model of the demodulated ampli-
tudes will obviously create biased amplitude
according to Peeters (2001) and Abe and Smith estimates. For the problem of slope estimation, it
(2005). Note the remarkable fact that the same appears to improve the fit of the polynomial model
estimator has been obtained for exponential ampli- such that it is preferred here. After the slope has
tude evolution by Abe and Smith (2005) and for a been determined from the maximum of the polyno-
first-order approximation of the spectrum of a mial, a re-normalization can be performed if the
sinusoid with linear amplitude evolution by Peeters unbiased amplitudes of the supporting points are
(2001). The fact that the amplitude evolution func- required.
tion does not affect the frequency-slope estimator
leads us to suppose that Equation 8 will provide
useful estimates for other windows than the Gauss- Experimental Evaluation
ian window as well. The argument here is that the
signal obtained after the analysis window has been The proposed parameter-estimation procedure is
applied can always be considered to be equivalently evaluated by comparing it to a number of recent
generated by means of a Gaussian analysis window parameter-estimation algorithms that have been

Rbel 73
proposed to work on non-stationary sinusoids. quency slope scales with the partial number such
Notably, we use the bias-correction algorithm that for high partials, extreme slopes may arise. The
proposed in Abe and Smith (2005) and the algorithm implementation of the algorithm used for the ex-
of Peeters and Rodet (1999). The results of these perimental investigation uses linearly interpolated
algorithms are denoted as AS and PR, respectively. demodulation kernels as proposed above.
Furthermore, we use the original version of the
demodulation estimator according described in
Rbel (2006; denoted as DE) and the new version Frequency Slope Estimation
that includes slope enhancement and uses the
Hanning window for all demodulation kernels The first experiment investigates the frequency-
(denoted as DS). slope estimation. Figure 1 shows the results ob-
All experiments are performed with Gaussian and tained with the enhanced demodulator DS and
Hanning analysis windows if the algorithms sup- with the AS method according to Equation 8.
port them. The window type that is used is indi- Because the DE and PRG estimators use exactly the
cated by adding the letter G for Gaussian, H for same frequency-slope estimate as the AS estimator,
Hanning, or X for both, to the estimator abbrevia- we do not consider those estimators here. We use
tion. In performance comparisons of the estimators, two different zero-padding factors (Fast Fourier
we will use the expression DSX is better than Transform [FFT] sizes N = 1,024 and N = 4,096)
ASX to denote the fact that DSH and DSG are and two different sets of modulation ranges. The
better than ASH and ASG, respectively. The win- strong modulation uses Dmax = 4/M 2 and amax = 1/M,
dow applied to the demodulation kernels will be and for weak modulation we select Dmax = 0.5/M 2
equal to the analysis window for DEX and Hanning and amax = 0.15/M. Note that the weak modulation
for DSX. The Gaussian analysis window is cut such range approximately covers the interval for which
that it has a length of 8, with being the standard the ASH bias correction has been derived in Abe
deviation of the Gaussian. To facilitate orientation, and Smith (2005). The DSX estimator has been
we display the results of the QIFFT estimator as tested with a set of demodulation offsets Do
well as the Cramer-Rao bounds for second-order [0.2,0.4,0.5,0.6,0.8]/M 2.
polynomial phase estimation described in Ristic The results demonstrate that the selection of this
and Boashash (1998). Note however, that these parameter is rather uncritical. It has a notable effect
bounds have been derived for constant-amplitude only for the DSH estimator, a very small zero-
polynomial phase signals, such that they can only padding factor, and strong modulation. This is
be used to provide an approximate idea of the esti- related to the fact that the initial frequency-slope
mator efficiency. estimate of the ASH that is the basis of the slope
In these experiments, we use synthetic test sig- refinement in DSH is rather poor. If Do is smaller
nals with a single sinusoid according to Equation 1 than the error, then the correction with the polyno-
with A = 1, 0 randomly sampled from a uniform mial model becomes less precise. Even for the
distribution over the normalized frequency range smallest offset, the DSH estimator was never worse
[0.2,0.3], randomly chosen from a uniform distri- than the ASH estimator. The smallest offset that
bution between [,], and varying slopes a and D. works close to the optimum for all of the experi-
The analysis window covers M = 1,001 samples in ments was Do = 0.5. Accordingly, we selected this
all cases. The frequency slope D is selected from a value for the following experiments.
uniform distribution over interval [Dmax,Dmax]. A number of conclusions can be drawn from the
Similarly the amplitude slope a is sampled from a experimental results in Figure 1. First, for strong
uniform distribution over the range [amax,amax]. The modulation the DSX methods have significantly
slope ranges are considered realistic for real-world lower bias than the ASX methods. Second, for the
signals. Note that in harmonic signals, the fre- Hanning window, the DSH estimator compared to

74 Computer Music Journal


Figure 1. Frequency slope and strong (c, d) amplitude phase signals is displayed
estimation errors for the and frequency modulation as lower limit. Algorithms
DS estimator with slope are considered. DFT size is using a Gaussian / Hanning
offset Do = 0.5/M2 and the N = 1,024 samples (a, c), window are distinguished
AS estimator. Window size and N = 4,096 samples (b, by means of solid / dashed
is M = 1,001 samples and d). The CRB for constant lines.
sinusoids with weak (a, b) amplitude polynomial

(a) (c)

(b) (d)

the ASH estimator achieves a reduction of the Bias Correction


estimation bias by 230 dB. The smallest improve-
ment is achieved for weak modulation and large We now investigate the main topic of this article,
zero-padding factors. The only case where the AS bias reduction. Owing to space constraints, we
estimator significantly outperforms the DS estima- present only a few of the experiments we con-
tor is weak modulation with a small zero-padding ducted. The results for all parameters for strong
factor and Gaussian analysis window. This could modulation with Dmax = 4/M 2, amax = 1/M, and an
have been expected, because the ASG estimator is FFT size of N = 4,096 are presented. Furthermore,
close to optimal for the Gaussian analysis window we select phase-bias reduction as an example and
and the small zero-padding factor does not influence discuss bias reduction for the phase estimate for
this estimator. As expected, the Hanning window weak and strong modulation and FFT sizes N =
has larger bias than the Gaussian window, but at the 1,024 and N = 4,096.
same time it is less sensitive to noise by about 4 dB. The results of the bias reduction algorithm for
In general, the DSX estimators are more sensitive to strong modulation and N = 4,096 are displayed in
noise than the ASX estimators by about 23 dB. Figures 2ac. As expected, the amplitude estimate

Rbel 75
Figure 2. Comparison of with Dmax = 4/M2 and signals is displayed as
the estimation errors for amax = 1/M; (df) phase- lower limit. Algorithms
the different parameter estimation errors for using a Gaussian / Hanning
estimators using (ac) different modulation window are distinguished
window size M = 1,001 and limits and FFT sizes. The by means of solid / dashed
FFT size N = 4,096 and CRB for constant ampli- lines.
(strong) linear AM / FM tude polynomial phase

(a) (d)

(b) (e)

(c) (f)

76 Computer Music Journal


of ASX (see Figure 2a) is strongly biased because the As a summary of the experimental investigation
amplitude-trajectory model does not match the of the algorithm using synthetic signals, we con-
signal. The PRG estimator that is based on linear clude that compared to the QIFFT estimator, all the
AM performs much better, but still cannot achieve bias-reduction algorithms dramatically reduce the
the performance of either the DE or the DS algo- estimation bias. Compared to the recent ASX
rithm. The DE and DS algorithms perform similar estimator, the simple and enhanced demodulation
and better than PRG even when using a Hanning algorithms both provide a significant reduction of
window. Note that the improved frequency slope the estimation bias especially if the range of the
estimate of the DSX estimator yields only a minor modulation is not confined to the rather limited
improvement for the amplitude estimate compared range of values that has been considered in Abe and
to DEX and that the increase of the noise sensitivity Smith (2005). Aside from the case of amplitude
of DEX and DSX is negligible. For frequency (see estimation, no remarkable differences exist between
Figure 2b) and phase estimation (see Figure 2c), DSX the ASG and PRG estimators. Comparing the DEX
has by far the smallest bias compared to the other and DSX algorithms, we have demonstrated that
estimators using the same analysis window. DEH the enhanced slope estimation has a direct and sig-
and ASH perform approximately similar for both for nificant impact on the bias of the sinusoidal param-
frequency and phase estimation. Given that DEX eters. Because the frequency-slope bias of the DEX
and ASX estimators both use the same frequency- algorithm increases with the modulation, we expect
slope estimate, this shows that the bias of these two that the DSX estimator is especially advantageous if
estimators is caused by the error in the frequency- the modulation is strong.
slope estimate, which is improved by the refined
slope estimate of DSX. Note that the PRG estimator
performs slightly worse than the ASG estimator. A Real-World Example
This seems remarkable given the fact that the ASG
estimator has been derived for exponential AM. The To demonstrate that the advantages of the proposed
increase of the noise sensitivity for the demodula- estimator are effective in real-world situations, we
tion algorithms is negligible for phase estimation. implemented the bias-reduction methods in a
Figures 2d2f show the effect of the phase-bias complete additive modeling system. Theoretical
removal for all the experimental settings that were investigation has been restricted to cover the case of
used in the evaluation of the frequency-slope resolved sinusoids only. For real-world applications,
estimation herein. A close inspection of the results however, the algorithm must prove that it will
reveals that the performance of the bias removal is perform well when the underlying model no longer
directly related to the performance of the frequency- holds (owing to transients, unresolved sinusoids
slope estimation. This is as expected, because any caused by reverberation, etc.). The major problem in
error in the frequency-slope estimate will translate real-world signals is related to the fact that the
into an error in the bias-correction algorithm. With enhanced frequency-slope estimation (DEX) de-
respect to algorithms using the Hanning window, scribed previously may produce extreme values
we can see that the DSH achieves the best results in whenever the underlying signal model does not
all cases. The ASH algorithm comes rather close match the observed peak. In these cases, the method
only for a large zero-padding factor and weak modu- may for example try to model the transient or nearby
lation. For Gaussian analysis windows, the DSG sinusoids by means of extreme slopes.
estimator is the best, with the exception for a small- To prevent the degeneration of the estimator, we
est zero-padding factor and small modulation, where use a number of conditions that allow us to detect
the DEH estimator achieves an advantage of about the cases for which the signal model used to analyze
10 dB owing to better frequency-slope precision. In the peak does not hold. The conditions used to
this case, the DSG estimator performs about 24 dB verify the reliability of the second-order polynomial
worse than the ASG and PRG estimators. model of the relation between demodulation slope

Rbel 77
and amplitude are: (1) verification that the extre- Table 1. Energy Reduction in the Residual Signal
mum of the polynomial model is a local maximum; Obtained with Different Bias-Reduction Algorithms
(2) verification that the amplitude that is obtained Frequency Band ASH DEH DSH
with the optimal demodulation slope is larger than
the amplitude obtained with the initial slope esti- Full Range 4.19 dB 4.72 dB 5.04 dB
mate; and (3) verification that the slope offset to 02 kHz 3.13 dB 3.75 dB 4.05 dB
reach the optimal slope is within 2Do. If one of 24 kHz 7.32 dB 8.40 dB 9.33 dB
46 kHz 5.78 dB 6.90 dB 7.32 dB
these tests fails, the polynomial representation of
the slope and amplitude relation is considered The performance of the algorithms varies with the frequency
unreliable, and the DEX estimator is used as a band.
fallback.
The test used to verify the validity of the linear The improvement is less pronounced because the
AM / FM sinusoidal representation is based on the FM modulation extent is low. In the mid-band
center of gravity of the energy (the mean time) of range, the FM modulation becomes stronger, and
the signal related to the spectral peak under investi- the reduction methods achieve residual energy
gation. If the mean time is larger than the maxi- reduction from 7.39.3 dB. For the highest band, the
mum mean time that can be expected for the signal FM modulation is still stronger, but the noise level
model of Equation 1, then we can assume that the is higher as well, such that the reduction of the
peak is related to a sinusoid with transient ampli- residual energy is not as strong.
tude evolution (Rbel 2003). In this situation, the The advantage of the demodulation methods over
exponential amplitude evolution used by the ASX ASH is clearly visible. The DEX estimator improves
estimator is more appropriate than the linear AM, the reduction of the ASH estimator by 0.51.1 dB.
and therefore the ASX estimator is used. Note that The DSX estimator is clearly the best, with an
the ASX and DEX estimators are sub-modules improvement compared to the ASH estimator of
required for the DSX estimator anyway, so the 0.82.0 dB. The residual signals for the QIFFT and
fallback solutions do not require additional costs DSH estimator are shown in Figure 3; the reduction
in terms of implementation or calculations. of the residual energy is clearly visible.
For the last experiment, we compare the estima-
tors by examining the energy of the residual signal
of a harmonic model of a tenor singer. The signal Conclusions
contains strong vibrato, and therefore the bias due
to the non-stationary parameters is expected to be We have shown that an efficient bias-reduction
significant. The harmonic models contain a maxi- strategy for the estimation of sinusoidal parameters
mum of 30 sinusoids at each time instant. We consists of a frequency-slope estimation and demod-
calculate the variance of the residual signal for the ulation prior to application of the standard QIFFT
QIFFT, DEH, DSH, and ASH methods for a signal estimator. The procedure significantly reduces the
window of 800 samples and an FFT size of 4,096 bias of the standard estimator. It does not require
samples. The variance of the residual signal is com- the use of a Gaussian analysis window, and it works
pared to the QIFFT estimator, and the reduction of for a much larger range of modulation depths than a
the residual energy in different frequency bands that recently proposed algorithm. By investigating the
can be achieved with each estimator is listed in reduction in the residual energy that can be ob-
Table 1. tained for a vibrato signal, we have shown that the
From Table 1, we can conclude that all bias- proposed enhanced demodulation estimator effec-
reduction methods achieve significant improve- tively works in real-world situations. It has been
ments of the residual energy. It is interesting to shown that, compared to the standard QIFFT esti-
compare performance in the different frequency mator, the reduction of the residual error depends
bands. In the low band, the improvement is 34 dB. on the frequency range and can be as large as 69 dB.

78 Computer Music Journal


Figure 3. Residual signal of
a vibrato tenor singer using
QIFFT estimator (a) and
the enhanced demodula-
tion method DSH (b).

(a) Abe, M., and J. O. Smith. 2005. AM / FM Rate Estimation


for Time-Varying Sinusoidal Modeling. Proceedings
of the 2005 International Conference on Acoustics,
Speech and Signal Processing, Vol. III. New York:
Institute for Electrical and Electronics Engineers,
pp. 201204.
Amatriain, X., et al. 2002. Spectral Processing. In
U. Zlzer, ed. Digital Audio Effects. New York: Wiley,
(b) pp. 373438.
Marques, J. S., and L. B. Almeida. 1986. A Background
for Sinusoid Based Representation of Voiced Speech.
Proceedings of the 1986 International Conference on
Acoustics, Speech and Signal Processing, Vol. II. New
York: Institute for Electrical and Electronics Engineers,
pp. 12331236.
Offelli, C., and D. Petri. 1992. The Influence of Window-
ing on the Accuracy of Multifrequency Signal Parame-
ter Estimation. IEEE Transactions on Instrumentation
and Measurement 41(2):256261.
Peeters, G. 2001. Modles et modification du signal so-
The computational costs of this algorithm are nore adapt ses charactristiques locales. Ph.D. The-
significantly higher than those for the standard sis, Universit Paris 6. Available online at recherche
estimator. A rough investigation into the computa- .ircam.fr / equipes / analyse-synthese / peeters / ARTICLES /
tional parameter-estimation complexity for a single Peeters_2001_PhDThesisv1.1.pdf.
spectral peak has shown that, compared to the Peeters, G., and X. Rodet. 1999. SINOLA: A New Anal-
standard QIFFT algorithm, the ASX, DEX, and DSX ysis / Synthesis Method Using Spectrum Peak Shape
algorithms increase the computational complexity Distortion, Phase and Reassigned spectrum. Proceed-
by factors of 2, 6, and 16, respectively. These num- ings of the 1999 International Computer Music Confer-
bers reflect only the parameter estimation, exclud- ence. San Francisco: International Computer Music
Association, pp. 153156.
ing the DFT and the peak detection. Although the
Quatieri, T. F., and R. J. McAulay. 1986. Speech Trans-
DSX algorithm is relatively costly, our tests have formation Based on a Sinusoidal Representation. IEEE
shown that for the analysis parameters used in the Transactions on Acoustics, Speech, and Signal Pro-
real-world example, real-time estimation of 1020 cessing 34(6):14491464.
sinusoids per analysis frame can be achieved. Ristic, B., and B. Boashash. 1998. Comments on The
Cramer-Rao Lower Bounds for Signals with Constant
Amplitude and Polynomial Phase. IEEE Transactions
References on Signal Processing 46(6):17081709.
Rbel, A. 2003. A New Approach to Transient Pro-
Abatzoglou, T. 1986. Fast Maximum Likelihood Joint cessing in the Phase Vocoder. Proceedings of the 6th
Estimation of Frequency and Frequency Rate. Pro- International Conference on Digital Audio Effects
ceedings of the 1986 International Conference on (DAFx03). London: Queen Mary, University of London,
Acoustics, Speech and Signal Processing, Volume II. pp. 344349.
New York: Institute for Electrical and Electronics Engi- Rbel, A. 2006. Estimation of Partial Parameters for Non
neers, pp. 14091412. Stationary Sinusoids. Proceedings of the 2006 Interna-
Abe, M., and J. O. Smith. 2004. Design Criteria for the tional Computer Music Conference. San Francisco: In-
Quadratically Interpolated FFT Method I: Bias Due to ternational Computer Music Association, pp. 167170.
Interpolation. Technical Report STAN-M-117, Stan- Saha, S., and S. M. Kay. 2002. Maximum Likelihood
ford University, Department of Music. Available online Parameter Estimation of Superimposed Chirps Using
at ccrma.stanford.edu / STANM / stanms / stanm114 / Monte Carlo Importance Sampling. IEEE Transactions
index.html. on Signal Processing 50(2):224230.

Rbel 79

You might also like