You are on page 1of 5

Maximum Negentropy Beamforming using

Complex Generalized Gaussian Distribution Model


Kenichi Kumatani
Disney Research, Pittsburgh
Barbara Rauch
Saarland University, Germany
John McDonough
Disney Research, Pittsburgh
Dietrich Klakow
Saarland University, Germany
AbstractThis paper presents a new beamforming method for
distant speech recognition. In contrast to conventional beamform-
ing techniques, our beamformer adjusts the active weight vectors
so as to make the distribution of beamformers outputs as super-
Gaussian as possible. That is achieved by maximizing negentropy
of the outputs.
In our previous work, the generalized Gaussian probability
density function (GG-PDF) for real-valued random variables
(RVs) was used for modeling magnitude of a speech signal and
a subband component was not directly modeled. Accordingly,
it could not represent the distribution of the subband signal
faithfully. In this work, we use the GG-PDF for complex RVs in
order to model subband components directly. The appropriate
amount of data for adapting the active weight vector is also
studied. The performance of the beamforming techniques is
investigated through a series of automatic speech recognition
experiments on the Multi-Channel Wall Street Journal Audio
Visual Corpus (MC-WSJ-AV). The data was recorded with real
sensors in a real meeting room, and hence contains noise from
computers, fans, and other apparatus in the room. The test data
is neither articially convolved with measured impulse responses
nor unrealistically mixed with separately recorded noise.
I. INTRODUCTION
Microphone array processing techniques for distant speech
recognition (DSR) have the potential to relieve users from
the necessity of donning close talking microphones (CTMs)
before interacting with automatic speech recognition (ASR)
systems [1], [2].
Adaptive beamforming is a promising technique for DSR.
A conventional beamformer in generalized sidelobe canceller
(GSC) conguration is structured such that the direct signal
from a desired direction is undistorted [2, 6.7.3]. Typical
GSC beamformers consist of three blocks, a quiescent vector,
blocking matrix and active weight vector. The quiescent vector
is calculated to provide unity gain for the direction of interest.
The blocking matrix is usually constructed in order to keep
a distortionless constraint for the signal ltered with the
quiescent vector. Subject to the constraint, the variance of the
beamformers output is minimized through the adjustment of
the active weight vector, which effectively places a null on any
source of interference, but can also lead to undesirable signal
cancellation [3]. To avoid the latter, many algorithms have
been developed; see [2, 13.5] for the review. However, those
algorithms based on the minimum variance criterion cannot
eliminate the signal cancellation effects.
In our previous work, we considered different criteria used
in the eld of independent component analysis (ICA) for
estimation of the active weight vector. The theory of ICA
states that nearly all information bearing signals, like subband
samples of speech, are super-Gaussian [4]. On the other hand,
noisy or reverberant speech consist of a sum of several signals,
and as such tend to have a distribution that is closer to Gaus-
sian. This follows from the central limit theorem, and can be
empirically veried [5]. Hence, by making the distribution of
the beamformers outputs as much super-Gaussian as possible,
we can remove the effects of noise and reverberation.
In [5], [6], we proposed a novel beamforming algorithm
which adjusted the active weight vectors so as to make the
beamformers output maximally super-Gaussian. As a measure
for the degree of super-Gaussianity we use negentropy, which
is dened as the difference between the entropy of Gaussian
and super-Gaussian random variables (RVs). We also showed
in [5] that such a beamformer can reduce noise and reverber-
ation without suffering from the signal cancellation problem.
For calculating negentropy of magnitude of subband sam-
ples, the uni-variate Generalized Gaussian (GG) PDF for the
real-valued RVs was used in [5], [6]. However, it may be
inaccurate to model the subband samples of speech since those
components are complex and nearly second-order circular [7].
Accordingly, we consider the GG-PDF for the complex-valued
RVs under the second-circular condition and apply it to
maximum negentropy beamforming in this work.
The balance of this paper is organized as follows. Section II
describes the GG-PDFs for the real-valued Rvs and complex-
valued RVs in the case of the strict second-circular condition.
Then, a method of training the parameters of the complex
GG-PDF is described in section II-C. Section III reviews the
denition of negentropy. In Section IV, we describe maximum
negentropy beamforming algorithms with super-directivity. In
Section V, we describe the results of far-eld automatic speech
recognition experiments. Finally, in Section VI, we present our
conclusions and plans for future work.
II. GENERALIZED GAUSSIAN PROBABILITY DENSITY
FUNCTION (GG-PDF)
A. Uni-Variate (real) GG-PDF
The GG-PDF for the real-valued RVs nds frequent appli-
cation in the blind source separation (BSS) and ICA elds [8].
It can be readily controlled with two kinds of parameters,
namely, the shape and scale parameters, so as to t a dis-
tribution of speech.
1420 978-1-4244-9720-1/10/$26.00 2010 IEEE Asilomar 2010
The uni-variate GG-PDF with zero mean for a real-valued
RV y can be expressed as
p
GG
(y) =
f
2(1/f)A
f

exp
_

y
A
f

f
_
, (1)
where is the scale parameter, f is the shape parameter which
controls how fast the tail of the PDF decays, and
A
f
=
_
(1/f)
(3/f)
_
1/2
. (2)
In (2), (.) is the gamma function. Note that the GG with
f = 1 corresponds to the Laplace PDF, and that setting f = 2
yields the Gaussian PDF, whereas in the case of f + the
GG PDF converges to a uniform distribution.
As described in [5], the maximum likelihood solution of the
scale parameter is different from the variance in the case of
f = 2. Therefore, we distinguish the scale parameter from the
variance.
B. Complex GG-PDF
In the case that the complex-valued RV, Y , has the second-
circular property, the complex GG-PDF can be expressed with
shape parameter f
b
and scaling parameter
a
as
p
GG,a
(Y ) =
f
b
(1/f
b
)B
f

a
exp
_

_
|Y |
2
B
f

a
_
f
b
_
, (3)
where
B
f
=
(1/f
b
)
(2/f
b
)
. (4)
By comparing (1) with (4), we can see that the difference
between the real and complex GG-PDFs is the normalization
factor only.
It is clear that the complex PDF (3) of Y and Y exp(j) is
the same for any . It is referred to as the strict circularity [9].
Moreover, the second-order statistics of Y and Y exp(j) are
the same, which characterizes the second-order circularity. It is
reported in [7] that the distribution of speech DFT coefcients
is nearly circular and not independent. In this work, we also
assume that the subband components are circular, which leads
to the signicant simplication of the GG-PDF. Notice that
the term proper is sometimes used instead of circular in [10].
Figure 1 shows log-likelihood of the complex GG-PDF (3)
with the unit variance. In the same case as the uni-variate
GG-PDF, the smaller shape parameter leads to a sharp con-
centration at zero.
C. Method for Estimating Scale and Shape Parameters
In this section, we only show a training method of the
parameters of the complex GG-PDF under the circular con-
dition (3). The formulae for estimating the parameters of the
GG-PDF for the real-valued RV can be found in [5], [8], [11].
In this work, we initialize the scale parameter of the com-
plex GG-PDF with the variance and then update the parameters
based on the maximum likelihood (ML) criterion. The shape
parameters are estimated from training samples ofine and are
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
4
2
0
2
4
6
8
CGG fb=0.25
CGG fb=0.5
CGG fb=1 (Gaussian)
CGG fb=2
Fig. 1. Log-likelihood of the complex GG-PDF.
then held xed during beamforming. The shape parameters are
estimated independently for each subband, as the optimal PDF
is frequency-dependent.
For a set Y = {Y
0
, Y
1
, . . . , Y
N1
} of N training subband
samples, the log-likelihood function under the GG-PDF as-
sumption can be expressed as
l(Y;
a
, f
b
) =N [log(f
b
) log {(1/f
b
)B
f

a
}]

N1

n=0
_
|Y
n
|
2
B
f

a
_
f
b
.
(5)
The parameters
a
and f
b
can be obtained by solving the
following equations:
l(Y;
a
, f
b
)

a
=
N

a
+
f
b

f
b
+1
a
N1

n=0
_
|Y
n
|
2
B
f
_
f
b
= 0, (6)
l(Y;
a
, f
b
)
f
b
= N
_
1
f
b
+
2
f
2
b
(1/f
b
)
2
f
2
b
(2/f
b
)
_

N1

n=0
_
|Y
n
|
2
B
f

a
_
f
b

_
log
_
|Y
n
|
2
B
f

a
_
+
1
f
b
{(1/f
b
) 2(2/f
b
)}
_
= 0,
(7)
where (.) is the digamma function. By solving (6) for
a
,
we obtain

a
=
1
B
f
_
f
b
N
N1

n=0
|Y
n
|
2f
b
_
1/f
b
. (8)
Due to the presence of the special functions, it is impossible
to solve (7) with respect to f
b
explicitly. Accordingly, we resort
to the golden section algorithm [12].
The training algorithm can be summarized:
1) Initialize the scale parameter
a
with the variance :

a
=
1
N
N1

n=0
|Y
n
|
2
. (9)
1421
TABLE I
DIFFERENTIAL ENTROPY FOR EACH TYPE OF THE GG-PDF.
PDF type Differential Entropy
Complex Gaussian PDF log(
2
a
) + 1 (11)
where
2
a
=
1
N
P
N1
n=0
|Yn|
2
GG-PDF (1) log{2(1 + 1/f)A
f
} + 1/f (12)
Complex GG-PDF (3) log{(1 + 1/f
b
)B
f
a} + 1/f
b
(13)
2) With the golden section algorithm, nd the shape param-
eter f
b
which provides the maximum likelihood.
3) Compute the scale parameter using (8).
4) Repeat 2. and 3. until the log-likelihood function (5)
converges.
III. NEGENTROPY
There are two popular criteria of non-Gaussianity, namely,
kurtosis and negentropy. The kurtosis criterion can be com-
puted without any PDF assumption. However, the value of
kurtosis might be greatly inuenced by a few samples with
a low observation probability. In [13], we applied the maxi-
mum kurtosis criterion to beamforming and showed that the
negentropy criterion is more robust than the kurtosis measure
especially in the case that a few amounts of data are available
for adaptation. Hence, we base the measurement of super-
Gaussianity on negentropy.
The negentropy for a complex-valued RV, Y , can be ex-
pressed as
J
d
(Y ) = H
gauss
(Y ) H
sg
(Y ). (10)
H
gauss
(Y ) stands for the differential entropy of the Gaussian
PDF with the same variance as Y and H
sg
(Y ) is the dif-
ferential entropy of the super-Gaussian PDF. In the normal
denition, is unity. In that case, the negentropy is non-
negative, and it is zero if and only if Y has a Gaussian dis-
tribution. However, we observed that the differential entropy
of the complex GG-PDF becomes small and very inuential
relative to that of the Gaussian PDF. Accordingly, we adjust
an equilibrium state by multiplying H
sg
(Y ) with a coefcient
. In the experiment, we empirically determined = 0.5.
Table I lists the differential entropy of the Gaussian distri-
bution for the complex-valued RV, the uni-variate GG-PDF for
the real-valued RV and complex GG-PDF under the circular
condition.
IV. MAXIMUM NEGENTROPY BEAMFORMING
A. Generalized Sidelobe Canceller Conguration
Consider a subband beamformer in the GSC congura-
tion [2, 13.7.3]. The output of our beamformer for a given
subband m at a frame k can be expressed as
Y (k, m) = (w
SD
(k, m) B(k, m)w
a
(k, m))
H
X(k, m),
(14)
where w
SD
(k, m) is the quiescent weight vector for a source,
B(k, m) is the blocking matrix, w
a
(k, m) is the active weight
vector, and X(k, m) is the input subband snapshot vector.
In this work, the weights of the super-directive beamformer
are used as the quiescent weight vector [6]. The blocking
matrix is constructed to satisfy the orthogonal condition
B
H
(k, m) w
SD
(k, m) = 0. This orthogonality implies that
the distortionless constraint will be satised for any choice of
w
a
. The blocking matrix is then calculated with the modied
Gram-Schmidt [14].
While the active weight vector w
a
is typically chosen to
minimize the variance of the beamformers outputs which
leads to the undesired signal cancellation, here we develop
optimization procedures to nd that w
a
which maximizes the
negentropy J(Y ) described in Section III.
For the experiments described in Section V, subband anal-
ysis and synthesis were performed with a uniform DFT lter
bank based on the modulation of a single prototype impulse
response [2, 11.7], which was designed to minimize each
aliasing term individually.
B. Estimation of Active Weights
Due to the absence of the close-form solution, we have
to resort to the numerical optimization algorithm in order to
obtain the active weight vectors. In this section, we omit the
frequency index m for the sake of simplicity.
In prior work, we used one utterance for estimation of the
active weight vector. However, those weights are preferably
updated with a small amount of adaptation data in many
applications. In this work, we calculate the negentropy of the
GSC beamformer for a block of input subband samples instead
of using the entire utterance data.
In order to calculate the negentropy, we rst need the
variance of the beamformer outputs Y (k). The variance of
the outputs at each block l can be calculated as

2
Y
l
=
1
K
l
K
l
1

k=0
|Y (k)|
2
. (15)
It is also necessary to calculate the scale parameter from
beamformers outputs. In the case of the complex GG-PDF (3),
based on (8), we calculate the scale parameter from the outputs
at block l with

a,Y
l
=
1
B
f
_
f
b
K
l
K
l
1

k=0
_
|Y (k)|
2f
b
_
_
1/f
b
. (16)
Here, we derive the formula for the gradient under the
complex GG-PDF assumption. Upon substituting (11) and (13)
into (10) and replacing
2
a
and
a
with
2
Y
l
and
a,Y
l
, we obtain
the negentropy at each block
J
l
(Y ) = log(
2
Y,l
) + 1
[log{(1 + 1/f
b
)B
f

a,Y
l
} + 1/f
b
] .
(17)
1422
In conventional beamforming, a regularization term is often
applied that penalizes large active weight vectors, and thereby
improves robustness by inhibiting the formation of excessively
large sidelobes [2]. Such a regularization term can be applied
in the present instance by dening the modied optimization
criterion
J
l
(Y ; ) = J
l
(Y ) w
a
(l)
2
(18)
for some real > 0. We set = 0.1 based on the results of
the speech recognition experiments in prior work [5].
Now, upon substituting (17) into (18) and taking the partial
derivative with respect to w
a
(l), we nd the gradient
J
l
(Y ; )
w
a

(l)
=
_

1
K
l
K
l
1

k=0
_
1

2
Y
l

f
b
|Y (k)|
2f
b
2
|B
f

a,Y
l
|
f
b
_
B
H
(k)X(k)Y

(k)
_
w
a
(l).
(19)
Based on (19), the active weight vector can be estimated
with K
l
subband samples at the l-th block. The block-wise
update method can be summarized:
1) Initialize the active weight with w
a
(0) = 0.
2) Given estimates of time delays, calculate the quiescent
vector and blocking matrix.
3) For each block of input subband samples, l = 1, 2, ,
repeat estimation of the active weight vector w
a
(l) by
the Polak-Ribi` ere conjugate gradient algorithm with (18)
and (19) until it converges.
4) Initialize the active weight vector for the next block and
go to the step 2.
V. EXPERIMENTS
We performed far-eld automatic speech recognition (ASR)
experiments on the Multi-Channel Wall Street Journal Audio
Visual Corpus (MC-WSJ-AV) from the Augmented Multi-party
Interaction (AMI); see Lincoln et al. [1] for the detail of the
data collection apparatus. The room size is 650 cm 490 cm
325 cm and the reverberation time T
60
was approximately
380 millisecond. In addition to reverberation, some recordings
include signicant amounts of background noise such as
computer fan and air conditioner noise. The far-eld speech
data was recorded with two circular, equi-spaced eight-channel
microphone arrays with diameter of 20cm. Additionally, the
close talking headset microphone (CTM) is used for each
speaker. The sampling rate of the recordings was 16 kHz. In
the single speaker stationary scenario of the MC-WSJ-AV, a
speaker was asked to read sentences from six positions, four
seated around the table, one standing at the white board and
one standing at presentation screen.
Our test data set for the experiments contains recordings of
10 speakers where each speaker reads approximately 40 sen-
tences taken from the 5,000 word vocabulary Wall Street
Journal (WSJ) task. It gives a total of 352 utterances which
correspond to 39.2 minutes of speech. There are a total of
11,598 word tokens in the reference transcriptions. The test
data does not include training data.
Prior to beamforming, we rst estimated the speakers
position with the Orion source tracking system [15]. Based
on the average speaker position estimated for each utterance,
active weight vectors w
a
were estimated for a source. In
experiments, an amount of data for adaptation is examined.
Zelinski post-ltering [16] was performed after beamforming.
The parameters of the GG pdf were trained with 43.9 minutes
of speech data recorded with the CTM in the SSC development
set. The training data set for the GG pdf contains recordings
of 5 speakers.
We performed four decoding passes on the waveforms
obtained with each of the beamforming algorithms described
in prior sections. The details of our ASR system used in the
experiments are written in [5]. Each pass of decoding used
a different acoustic model or speaker adaptation scheme. For
all passes save the rst unadapted pass, speaker adaptation
parameters were estimated using the word lattices generated
during the prior pass, as in [17]. A description of the four
decoding passes follows:
1. Decode with the unadapted, conventional ML acoustic
model and bigram language model (LM).
2. Estimate vocal tract length normalization (VTLN) [2, 9]
parameters and constrained maximum likelihood linear
regression parameters (CMLLR) [2, 9] for each speaker,
then redecode with the conventional ML acoustic model
and bigram LM.
3. Estimate VTLN, CMLLR, and maximum likelihood linear
regression (MLLR) [2, 9.2] parameters for each speaker,
then redecode with the conventional model and bigram LM.
4. Estimate VTLN, CMLLR, MLLR parameters for each
speaker, then redecode with the ML-SAT model [2, 8.1]
and bigram LM.
Table II shows the word error rates (WERs) for every
beamforming algorithm. As references, WERs in recognition
experiments on speech data recorded with the single distant
microphone (SDM) and CTM are described in Table II.
It is clear from Table II that the best recognition perfor-
mance, WER 12.1%, is obtained by maximum negentropy
beamforming with the super-directivity under the real GG-
PDF assumption (SD-MN BF with GG-PDF). It is also clear
from Table II that the super-directive beamforming maximum
negentropy algorithm with the complex GG-PDF (SD-MN
BF with CGGD-PDF) provides the second best recognition
performance, WER 12.2%. Comparing those results, we are
led to conclude that there is no signicant difference between
the real and complex GG-PDF assumptions in terms of speech
recognition performance. In these experiments, the active
weight vectors of all the maximum negentropy beamformers
are iteratively estimated by the Polak-Ribi` ere conjugate gra-
dient algorithm with one utterance data.
It can be seen from Table II that conventional maximum
beamforming algorithm (Conventional MN BF) can provide
the better recognition performance than the other traditional
beamforming methods, the delay-and-sum beamformer (D&S
BF), super-directive beamformer (SD-BF) and the minimum
variance distortionless response (MVDR) beamformer. Notice
1423
TABLE II
WORD ERROR RATES FOR EACH BEAMFORMING ALGORITHM AFTER
EVERY DECODING PASS.
Beamforming Pass (%WER)
Algorithm 1 2 3 4
D&S BF 79.0 38.1 20.2 16.5
MVDR BF 78.6 35.4 18.8 14.8
SD BF 71.4 31.9 16.6 14.1
GEV BF 78.7 35.5 18.6 14.5
Conventional MN BF 75.1 32.7 16.5 13.2
SD-MN BF with GG-PDF 74.9 32.1 15.4 12.1
SD-MN BF with CGGD-PDF 75.3 30.9 15.5 12.2
SDM 87.0 57.1 32.8 28.0
CTM 52.9 21.5 9.8 6.7
TABLE III
WERS AS A FUNCTION OF AN AMOUNT OF DATA.
Data amount Pass (%WER)
1 2 3 4
0.25 sec. 77.3 34.3 17.6 14.9
0.50 sec. 76.7 32.6 16.1 13.2
0.75 sec. 76.1 31.5 16.1 12.7
1.00 sec. 76.2 32.2 15.9 12.3
one sample 76.7 33.3 17.9 14.9
that MVDR beamforming algorithms require speech activity
detection in order to avoid the signal cancellation. For the
adaptation of the MVDR beamformer, we used the rst 0.1
and last 0.1 seconds in each utterance data which contain
only background noise. Again, in contrast to conventional
beamforming methods, our algorithm does not need to detect
the start and end points of target speech since the proposed
method can suppress noise and reverberation without the signal
cancellation problem. Table II also shows the recognition
results obtained with the generalized eigenvector beamformer
(GEV BF) proposed by E. Warsitz et al. [18]. It achieved
slightly better recognition performance than the MVDR beam-
former in this task. It is worth noting that the best result of
12.1% in Table II is signicantly less than half the word error
rate reported elsewhere in the literature on this far-eld ASR
task [1].
Table III shows the WERs as a function of the data
amount for estimation of the active weight vector. In these
experiments, the active weight vectors are iteratively updated
in the block-wise manner described in IV-B. The rst column
of Table III indicates the duration of one block. It is clear
that the larger block of data leads to the better recognition
performance. This is because the gradient approximation be-
comes more stable. In this task, one second of speech data was
enough to obtain the accurate gradient approximation. On the
other hand, in the case that the active weight vectors were
updated by the steepest descent gradient algorithm at each
frame (one sample), the good recognition performance was
not obtained due to noisy instantaneous gradient values.
VI. CONCLUSIONS
In this work, we investigated the maximum negentropy
beamforming algorithms with the super-directivity. We ap-
plied the GG-PDF for the complex-valued RVs to the MN
beamforming algorithm although we could not observe the
signicant difference between the real and complex GG-PDF
assumptions. We also described the block-wise estimation
method for MN beamforming and examined the appropriate
amount of data for adaptation. In this task, one second
of speech data was enough to obtain the accurate gradient
approximation.
REFERENCES
[1] M. Lincoln, I. McCowan, I. Vepa, and H. K. Maganti, The multi-
channel Wall Street Journal audio visual corpus ( mc-wsj-av): Speci-
cation and initial experiments, in Proc. IEEE workshop on Automatic
Speech Recognition and Understanding (ASRU), 2005, pp. 357362.
[2] M. W olfel and J. McDonough, Distant Speech Recognition. New York:
Wiley, 2009.
[3] B. Widrow, K. M. Duvall, R. P. Gooch, and W. C. Newman, Signal
cancellation phenomena in adaptive antennas: Causes and cures, IEEE
Transactions on Antennas and Propagation, vol. AP-30, pp. 469478,
1982.
[4] A. Hyv arinen and E. Oja, Independent component analysis: Algorithms
and applications, Neural Networks, 2000.
[5] K. Kumatani, J. McDonough, B. Rauch, D. Klakow, P. N. Garner,
and W. Li, Beamformingwith a maximum negentropy criterion, IEEE
Transactions on Audio, Speech and Language Processing, vol. 17, pp.
9941008, 2009.
[6] K. Kumatani, L. Lu, J. McDonough, A. Ghoshal, and D. Klakow,
Maximum negentropy beamforming with superdirectivity, in Proc.
Eusipco, Aalborg, Denmark, 2010.
[7] J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, Mini-
mum mean-square error estimation of discrete fourier coefcients with
generalized gamma priors, IEEE Transactions on Audio, Speech and
Language Processing, vol. 15, pp. 17411752, 2007.
[8] M. Novey, T. Adal, and A. Roy, A complex generalized gaussian distri-
bution: characterization, generation, and estimation, IEEE Transactions
on Signal Processing, vol. 58, pp. 14271433, 2010.
[9] B. Picinbono, On circularity, IEEE Transactions on Signal Processing,
vol. 42, pp. 34733482, 1994.
[10] F. D. Neeser and J. L. Massey, Proper complex random processes with
applications to information theory, IEEE Transactions Info. Theory,
vol. 39, no. 4, pp. 12931302, July 1993.
[11] M. K. Varanasi and B. Aazhang, Parametric generalized gaussian
density estimation, J. Acoust. Soc. Am., vol. 86, pp. 14041415, 1989.
[12] D. P. Bertsekas, Nonlinear Programming. Belmont, Massachusetts:
Athena Scientic, 1995.
[13] K. Kumatani, J. McDonough, B. Rauch, P. N. Garner, W. Li, and
J. Dines, Maximum kurtosis beamforming with the generalized sidelobe
canceller, in Proc. Interspeech, Brisbane, Australia, 2008.
[14] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed.
Baltimore: The Johns Hopkins University Press, 1996.
[15] T. Gehrig, U. Klee, J. McDonough, S. Ikbal, M. W olfel, and C. F ugen,
Tracking and beamforming for multiple simultaneous speakers with
probabilistic data association lters, in Proc. Interspeech, 2006, pp.
25942597.
[16] C. Marro, Y. Mahieux, and K. U. Simmer, Analysis of noise reduction
and dereverberation techniques based on microphone arrays with post-
ltering, IEEE Transactions on Speech and Audio Processing, vol. 6,
pp. 240259, 1998.
[17] L. Uebel and P. Woodland, Improvements in linear transform based
speaker adaptation, in Proc. IEEE International Conference on Acous-
tics, Speech, and Signal Processing (ICASSP), 2001.
[18] E. Warsitz, A. Krueger, and R. Haeb-Umbach, Speech enhancement
with a new generalized eigenvector blocking matrix for application in a
generalized sidelobe canceller, in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV,
U.S.A, 2008.
1424

You might also like