Source Localization Over Spherical Microphone Array

Speech Source Localization over Spherical
Microphone Array
Lalan Kumar
Electrical Engineering Department

Indian Institute of Technology Kanpur
WISSAP 2015
Jan 4-7, 2015
Speech Source Localization over Spherical Microphone Array
1 / 28
WISSAP 2015
Presentation Outline
I Why Source Localization?
I My Research Journey : Uniform Linear Array (ULA) to Spherical Microphone
Array (SMA)
. Spherical Coordinate System
. Uniform Linear Array and Uniform Circular Array (UCA)
. Data Model in Spatial Domain
. MUltiple SIgnal Classfication (MUSIC) and MUSIC-Group delay (MGD) Spectrum
I Near-field Source Localization in Spherical Harmonics (SH) Domain
. Data Model in SH Domain
. SH-MUSIC, SH-MGD, SH-MVDR
. Cramr-Rao Bound Analysis
. Experiments on Source Localization
I Conclusion
2 / 28
WISSAP 2015
Why Source Localization?

I Distant speech recognition, speech enhancement, assistive living1, rendering spatial audio
I Session on Speaker Localization included in INTERSPEECH 2014
Pranjal Agrawal, Aseem Kushwah, Lalan Kumar, and Rajesh M. Hegde, "On the Rapid Prototyping of a Portable
Multi Media Acquisition System for Intelligent Meeting Capture." Journal of Signal Processing Systems 75, no. 3 (2014):
233-243.
3 / 28
WISSAP 2015
My Research Journey : ULA to SMA
Spherical Coordinate system

I Location of a source is given by r = (r, ), with = (, )
I The range (r), elevation () and azimuth () takes values as r (0, ),
[0, ], [0, 2]
Z
4 / 28
WISSAP 2015

Linear and Planar Arrays
S1
M1
M0
X
M1
M0
M2
M3
S2
Uniform Linear Array geometry
Front back ambiguity in ULA

Z
Uniform circular array
5 / 28
WISSAP 2015

Data Model in Spatial Domain
I A sound field of L far-field sources with wavenumber k, is incident on a
microphone array of I microphones.
I In spatial domain, the sound pressure, p(k) = [p1(k), p2(k), . . . , pI (k)]T , is
written as,
p(k) = V(, k)s(k) + n(k),
(1)
I V(, k) is I L steering matrix, s(k) is L 1 vector of signal amplitudes,
n(k) is I 1 vector of zero mean, uncorrelated sensor noise.
I The steering matrix V(, k) is expressed as
V(, k) = [v1(1, k), v2(, k), . . . , vL(, k)], where
vl (l , k) = [e
jkT
l r1
jkT
l r2
,e
jkT
l rI T
,...,e
(3)
I kl = (k sin l cos l , k sin l sin l , k cos l )T , with l = /2 for ULA.

I ri = ((i 1)d, 0, 0)T for ULA and ri = (r cos i, r sin i, 0)T for UCA.
6 / 28
(2)
WISSAP 2015

MUSIC and MUSIC-Group Delay Spectrum for Source Localization
I The MUSIC spectrum for source localization is given by
PM U SIC () =
(4)
vH ()Rpns[Rpns]H v()
Rpns is noise subspace obtained from eigenvalue decomposition of autocorrelation matrix, Rp = E[p(k)p(k)H ].
I MUSIC-Group delay spectrum is given by
PM GD () = (
U
X
|arg(v().qu)|2).PM U SIC ()
(5)
u=1
I U = I L, is the gradient operator, arg(.) indicates unwrapped phase,

and qu represents the uth eigenvector of the noise subspace, Rpns.
7 / 28
WISSAP 2015

MUSIC Magnitude and MUSIC phase Spectrum : ULA and UCA
MP
4000
5
0
5
100
MUSIC Magnitude
2000
80
60
0
100
Ele() 40
50
Ele()
1
40
20
60
80
100
120
180
160
140
20
0
120
140
160
180
1
MP
0.5
0
0
100
80
60
40
20
10
20
30
40
Azi()
50
60
70
80
0.5
0
0
90
(a)
10
20
30
40
Azi()
50
60
70
80
90
(b)
(a) Spectral magnitude of MUSIC for UCA (top) and ULA (bottom). (b)Spectral phase of MUSIC for
UCA (top) and ULA (bottom). Sources at (15,50) and (20,60) for UCA. Sources at 50 and 60
for ULA.
8 / 28
WISSAP 2015

Group Delay and MUSIC-Group delay Spectrum : ULA and UCA
x 10
30
MUSICGroup Delay
Standard Group delay
20
10
0
100
50
Ele()
1
20
60
40
80
100
120
140
180
160
0.5
0
0
2
0
100
50
Ele() 0
1
20
40
80
60
100
120
180
160
140
Azimuth()
0.5
10
20
30
40
50
60
70
80
0
0
90
10
20
30
40
Azi()
Azi()
(a)
(b)
50
60
70
80
90
(a) Standard group delay spectrum of MUSIC for UCA (top) and ULA (bottom) (b) MUSIC-Group
delay spectrum for UCA (top) and ULA (bottom).
2
Kumar, L.; Tripathy, A.; Hegde, R.M., "Robust Multi-Source Localization Over Planar Arrays Using MUSICGroup Delay Spectrum," Signal Processing, IEEE Transactions on , vol.62, no.17, pp.4627,4636, Sept.1, 2014 doi:
10.1109/TSP.2014.2337271
9 / 28
WISSAP 2015

Application in DSR
Z
Estimate
DOA
Compute
TDOA
DSR
S1 (40,19)
T60
T60
Methods CTM
(150ms) (250ms)
MGD
12.98
23.96
MONC MUSIC
9.2 14.21
26.01
BS-MUSIC
15.02
27.99
Train
FSB
10 / 28
S2 (30,15)
T60
T60
(150ms) (250ms)
11.99
23.58
13.78
25.56
15.22
27.32
WISSAP 2015
My Research Journey : ULA to SMA3

Spherical Microphone Array (SMA)
I The position vector of ith microphone is given as ri = (ra, i) where ra is
radius of the spherical array ands i = (i, i).
Far-field
Near-field
(a)
(b)
(a) Spherical microphone array : Eigenmike system (b)Near-field and far-field region around
spherical microphone array. The ith microphone is positioned at ri and lth source at rl .
3
Kumar, L.; Singhal, K.; Hegde, R.M., "Robust source localization and tracking using MUSIC-Group delay spectrum
over spherical arrays," CAMSAP 2013, vol., no., pp.304,307, 15-18 Dec. 2013
11 / 28
WISSAP 2015
Near-field Source Localization in SH Domain

I Pressure at the ith microphone due to lth source is

|ri rl |
,
c
sl (ti (l ))
|ri rl |
with i(l ) =
where c is speed of sound.
I Total pressure at ith microphone amounts to be

pi(t) =
L
X
sl (t i(l ))
|ri rl |
l=1
(6)
+ ni(t).
I Taking Fourier transform, the Equation 6 turns out to be

pi(fq ) =
L
X
ej2fq i(l)
l=1
|ri rl |
sl (fq ) + ni(fq ), q=1, ,Q.
(7)
Kumar, L.; Singhal, K.; Hegde, R.M., "Near-field source localization using spherical microphone array," HSCMA 2014,
vol., no., pp.82,86, 12-14 May 2014
12 / 28
WISSAP 2015

I Dropping q, the Equation 7 can be re-written in wavenumber domain as
pi(k) =
L
X
ejk|rirl|
l=1
|ri rl |
(8)
sl (k) + ni(k).
I In matrix form, the final near-field data model in spatial domain can be written as
p(k) = V()s(k) + n(k)
(9)
I The steering matrix V() is
(10)
V() = [v(1), v(2), . . . , v(L)], where

ejk|r1rl|
ejk|rI rl| T
v(l ) = [
,...,
]
|r1 rl |
|rI rl |
13 / 28
(11)
WISSAP 2015

Data Model in Spherical Harmonics Domain
I Monochromatic spherical wave solution for wave equation
written in spherical coordinates as
ejk|rirl|
|ri rl |
X
n
X
ejk|ri rl |
,
|ri rl |
bn(k, ra, rl )Ynm(l )Ynm(i)
can be
(12)
n=0 m=n
I bn(k, ra, rl ) is nth order near-field mode strength. It is related to far-field

mode strength bn(k, ra) as bn(k, ra, rl ) = j (n1)kbn(k, ra)hn(krl ).
I The far-field mode strength for open sphere (virtual sphere) and rigid sphere
[1] is given by
bn(k, r) = 4j njn(kr), open sphere
0

j
n (kra )
n
= 4j jn(kr) 0
hn(kr) , rigid sphere.
hn(kra)
14 / 28
WISSAP 2015
(13)
(14)

Near-field Criterion for SMA
50
Magnitude(dB)
n=0
n=1
n=2
n=3
50
n=4
100
Nearfield
Farfield
150
200
250 1
10
10
10
Kmax
Far-field and near-field mode strength for Eigenmike system. Near-field source is at rl = 1m and
order is varied from n = 0 (top) to n = 4 (bottom)
I The near-field criteria for spherical array is presented based on similarity of near-field mode strength (|bn(k, ra, rl )|) and far-field mode strength
(|bn(k, ra)|).
I The two functions start behaving in similar way at krl N , for array of
order N as shown in the Figure.
I Hence, near-field condition for spherical array turns out to be rN F
ra rl N
[2].
k
15 / 28
WISSAP 2015
N
k
and

Spherical Harmonics
I Ynm represents spherical harmonic of order n and degree m given by
s
Ynm(, )
(2n + 1)(n m)!
Pnm(cos)ejm.
4(n + m)!
0 n N, n m n
where Pnm are the associated Legendre function.

I Spherical harmonics plot : Y00, Y10, Y11
16 / 28
WISSAP 2015
(15)

Spherical Fourier Transform
I Assuming continuous distribution of pressure, the spherical Fourier transform (SFT) of received pressure pc(k, r, , ) at (r, , ), is given as [3]
Z
pnm(k, r) =
0
pc(k, r, , )[Ynm(, )] sin()dd
(16)
I Rewriting Equation 16 for discrete microphone array

pnm(k, r)
=
I
X
aipi(k, r, i)[Ynm(i)]
(17)
i=1
I In matrix form for all n and m, we have

pnm(k, r) = YH ()p(k, r, )
(18)
where = diag(a1, a2, , aI ) is matrix of sampling weights.

17 / 28
WISSAP 2015

I Substituting the expression for pressure from Equation 12 in Equation 11,
the steering matrix in Equation 10 can be written as
V() = Y()[B(r1)yH (1), , B(rL)yH (L)]
(19)
I Y() is I (N + 1)2 matrix. A particular ith row vector can be written as

y(i) = [Y00(i), Y11(i), Y10(i), Y11(i), . . . , YNN (i)].
(20)
I The (N + 1)2 (N + 1)2 matrix B(rl ) is given by

B(rl ) = diag(b0(k, ra, rl ), b1(k, ra, rl ), b1(k, ra, rl ), b1(k, ra, rl ), .., bN (k, ra, rl ))
(21)
18 / 28
WISSAP 2015

I Substituting (19) in (9), multiplying both side by YH () and utilizing Equation 17, the data model becomes
pnm(k, r) = YH ()Y()[B(r1)yH (1), , B(rL)yH (L)]s(k) + nnm(k)
(22)
I Orthogonality of spherical harmonics under spatial sampling suggests [4],
YH ()Y()
= I.
I The data model in spherical harmonics domain turns out to be
pnm(k) = [B(r1)yH (1), , B(rL)yH (L)]s(k) + nnm(k).
(23)
I Re-writing the data model in more compact way, we have

(24)
pnm(k) = Vnm(r, )s(k) + nnm(k)

19 / 28
WISSAP 2015

SH-MUSIC, SH-MGD, SH-MVDR
I The near-field spherical harmonics MUSIC spectrum can now be written as
PSHM U SIC (rs, s) =
(25)
vnmH Rpns[Rpns]H vnm
I The Spherical Harmonics MUSIC-Group delay (SH-MGD) spectrum is computed as

PSHM GD (rs, s) = (
U
X
|arg(vnmH .qu)|2).PM M
(26)
u=1
I The SH-MVDR spectrum for near-field source localization, is written as

PM V DR(rs, s) =
(27)
y(s)BH Rp1ByH (s)
20 / 28
WISSAP 2015
x : 60
Y : 0.06
z : 0.66
0.5
0
0.1
x : 55
Y : 0.08
z:1
0.5
0
0.1
0.08
20
0.04 0
40
Elevation( )
60
0.08
0.06
Range(m)
20
0.04 0
60
40
80
0.06
Range(m)
Elevation( )
SHMGD
0.5
0.5
20
0 0
40
20
Elevation()
60
x : 60
Y : 30
z : 0.96
0.5
0
80
80
60
80
x : 55
Y : 40
z:1
x : 60
Y : 30
z : 0.8
0
80
20
0.04 0
60
40
Elevation( )
(c)
x : 55
Y : 40
z:1
1
x : 60
Y : 30
z : 0.71
40
Azimuth()
0.08
(b)
x : 55
Y : 40
z:1
x : 60
Y : 0.06
z : 0.96
0
0.1
80
(a)
1
x : 55
Y : 0.08
z:1
0.5
SHMVDR
Range(m) 0.06
SHMUSIC
x : 60
Y : 0.06
z : 0.71
SHMVDR
x : 55
Y : 0.08
z:1
SHMGD
SHMUSIC
80
(d)
60
Azimuth()
40
20
0 0
(e)
20
40
Elevation()
60
80
60
40
Azimuth()
20
0 0
20
(f)
The sources are at (0.06m,60,30) and (0.08m,55,40) with SNR 10dB.
21 / 28
WISSAP 2015
60
40
Elevation()
80
Near-field Source Localization : 4D Scatter Plot for source at (0.06m,60,30)

55.3
49.8
0.063
44.3
Range(m)
0.062
38.8
0.061
33.3
X : 60
Y : 30X:Y: 6030
Z: 0.06
Z : 0.06
0.06
27.8
0.059
22.3
0.058
33
16.7
32
11.2
31
30
Azimuth()
29
28
27
57
58
22 / 28
59
60
Elevation()
61
62
63
5.72
0.2
WISSAP 2015

Cramr-Rao Bound Analysis
I The unknown parameter vector is = [r T T T ]T with r = [r1 rL]T =
[1 L]T and = [1 L]T .
I Based on Cramr-Rao bound (CRB) expression for far-field case5, CRB expression for near-field case, can be obtained using following Fisher information matrix (FIM) elements
H
H R1V
nm )
Fr = 2Re (RsVnm
Rp1VnmRs)T (V
nmr p

1
T
H
H
1
+ (RsVnmRp Vnmr ) (RsVnmRp Vnm )

H
nm )
H R1V
Rp1VnmRs)T (V
F = 2Re (RsVnm
nm
p

H
nm )
nm )T (RsVH R1V
Rp1V
+ (RsVnm
p
nm
(28)
(29)
I Other block of FIM can be written in similar way.

5
Kumar, L.; Hegde, R.M., "Stochastic Cramr-Rao Bound Analysis for DOA Estimation in Spherical Harmonics Domain," Signal Processing Letters, IEEE , vol.22, no.8, pp.1030-1034, Aug. 2015 doi: 10.1109/LSP.2014.2381361
23 / 28
WISSAP 2015
Cramr-Rao Bound Analysis

6
x 10
CRB(r)
CRB()
CRB()
CRB
1.5
0.5
0
10
7.5
2.5
0
SNR (dB)
24 / 28
2.5
7.5
10
WISSAP 2015
Experiments on Source Localization

I RMSE was found for sources at (0.06,30,45) and (0.08,40,50) for 100 iteration.
Comparison of the RMSE in (r, ) at known .
SNR (dB) S
S1
-10
S2
S1
-5
S2
S1
0
S2
SH-MGD
(0.001,0.4)
(0,0.4243)
(4.47e-04,0)
(4.9e-04,0)
(0,0)
(0,0)
SH-MUSIC
(0.001,0.2449)
(0.001,0.2)
(2.0e-04,0)
(0,0.1414)
(0,0)
(0,0)
25 / 28
SH-MVDR
(0.013,2.97)
(0.007,2.05)
(0.0028,1.0)
(0.0018,0.7071)
(0.001,0)
(4.0e-04,0)
WISSAP 2015
Conclusion
I MUSIC-Group delay based source localization has been presented for ULA,
UCA and SMA.
I Near-field source localization for simultaneous estimation of range and bearing, has been utilized for the fist time.
I Experiments on source localization is presented as RMSE.
I Near-field array processing using sparse recovery technique in SH domain,
will be dealt with in future.
26 / 28
WISSAP 2015
References
[1] E. G. Williams, Fourier acoustics: sound radiation and nearfield acoustical
holography. Access Online via Elsevier, 1999. 14
[2] E. Fisher and B. Rafaely, Near-field spherical microphone array processing with radial filtering, Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 19, no. 2, pp. 256265, 2011. 15
[3] J. R. Driscoll and D. M. Healy, Computing fourier transforms and convolutions on the 2-sphere, Advances in applied mathematics, vol. 15, no. 2, pp.
202250, 1994. 17
[4] B. Rafaely, Analysis and design of spherical microphone arrays, Speech
and Audio Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135143,
2005. 19
27 / 28
WISSAP 2015
Thank You
Lalan Kumar
lalank@iitk.ac.in
http://home.iitk.ac.in/~lalank/

Source Localization Over Spherical Microphone Array

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Source Localization Over Spherical Microphone Array

Uploaded by

Copyright:

Available Formats

Speech Source Localization over Spherical

Electrical Engineering Department

Speech Source Localization over Spherical Microphone Array

Why Source Localization?

My Research Journey : ULA to SMA

Spherical Coordinate system

Speech Source Localization over Spherical Microphone Array

My Research Journey : ULA to SMA

Uniform Linear Array geometry

Front back ambiguity in ULA

Uniform circular array

Speech Source Localization over Spherical Microphone Array

My Research Journey : ULA to SMA

I kl = (k sin l cos l , k sin l sin l , k cos l )T , with l = /2 for ULA.

My Research Journey : ULA to SMA

I U = I L, is the gradient operator, arg(.) indicates unwrapped phase,

My Research Journey : ULA to SMA

Speech Source Localization over Spherical Microphone Array

My Research Journey : ULA to SMA

Standard Group delay

My Research Journey : ULA to SMA

My Research Journey : ULA to SMA3

Near-field Source Localization in SH Domain

I Pressure at the ith microphone due to lth source is

where c is speed of sound.

I Total pressure at ith microphone amounts to be

I Taking Fourier transform, the Equation 6 turns out to be

sl (fq ) + ni(fq ), q=1, ,Q.

Near-field Source Localization in SH Domain

V() = [v(1), v(2), . . . , v(L)], where

Speech Source Localization over Spherical Microphone Array

Near-field Source Localization in SH Domain

bn(k, ra, rl )Ynm(l )Ynm(i)

I bn(k, ra, rl ) is nth order near-field mode strength. It is related to far-field

Near-field Source Localization in SH Domain

Near-field Source Localization in SH Domain

(2n + 1)(n m)!

where Pnm are the associated Legendre function.

Speech Source Localization over Spherical Microphone Array

Near-field Source Localization in SH Domain

pc(k, r, , )[Ynm(, )] sin()dd

I Rewriting Equation 16 for discrete microphone array

I In matrix form for all n and m, we have

where = diag(a1, a2, , aI ) is matrix of sampling weights.

Near-field Source Localization in SH Domain

I Y() is I (N + 1)2 matrix. A particular ith row vector can be written as

I The (N + 1)2 (N + 1)2 matrix B(rl ) is given by

Speech Source Localization over Spherical Microphone Array

Near-field Source Localization in SH Domain

I Re-writing the data model in more compact way, we have

pnm(k) = Vnm(r, )s(k) + nnm(k)

Near-field Source Localization in SH Domain

vnmH Rpns[Rpns]H vnm

I The Spherical Harmonics MUSIC-Group delay (SH-MGD) spectrum is computed as

I The SH-MVDR spectrum for near-field source localization, is written as

Speech Source Localization over Spherical Microphone Array

y(s)BH Rp1ByH (s)

Near-field Source Localization in SH Domain

The sources are at (0.06m,60,30) and (0.08m,55,40) with SNR 10dB.

Speech Source Localization over Spherical Microphone Array

Near-field Source Localization in SH Domain

Near-field Source Localization : 4D Scatter Plot for source at (0.06m,60,30)

Speech Source Localization over Spherical Microphone Array