You are on page 1of 28

Speech Source Localization over Spherical

Microphone Array
Lalan Kumar

Electrical Engineering Department


Indian Institute of Technology Kanpur
WISSAP 2015
Jan 4-7, 2015

Speech Source Localization over Spherical Microphone Array

1 / 28

WISSAP 2015

Presentation Outline
I Why Source Localization?
I My Research Journey : Uniform Linear Array (ULA) to Spherical Microphone
Array (SMA)
. Spherical Coordinate System
. Uniform Linear Array and Uniform Circular Array (UCA)
. Data Model in Spatial Domain
. MUltiple SIgnal Classfication (MUSIC) and MUSIC-Group delay (MGD) Spectrum
I Near-field Source Localization in Spherical Harmonics (SH) Domain
. Data Model in SH Domain
. SH-MUSIC, SH-MGD, SH-MVDR
. Cramr-Rao Bound Analysis
. Experiments on Source Localization
I Conclusion
Speech Source Localization over Spherical Microphone Array

2 / 28

WISSAP 2015

Why Source Localization?


I Distant speech recognition, speech enhancement, assistive living1, rendering spatial audio
I Session on Speaker Localization included in INTERSPEECH 2014

Pranjal Agrawal, Aseem Kushwah, Lalan Kumar, and Rajesh M. Hegde, "On the Rapid Prototyping of a Portable
Multi Media Acquisition System for Intelligent Meeting Capture." Journal of Signal Processing Systems 75, no. 3 (2014):
233-243.
Speech Source Localization over Spherical Microphone Array

3 / 28

WISSAP 2015

My Research Journey : ULA to SMA

Spherical Coordinate system


I Location of a source is given by r = (r, ), with = (, )
I The range (r), elevation () and azimuth () takes values as r (0, ),
[0, ], [0, 2]
Z

Speech Source Localization over Spherical Microphone Array

4 / 28

WISSAP 2015

My Research Journey : ULA to SMA


Linear and Planar Arrays
S1

M1

M0

X
M1

M0

M2

M3

S2

Uniform Linear Array geometry

Front back ambiguity in ULA


Z

Uniform circular array

Speech Source Localization over Spherical Microphone Array

5 / 28

WISSAP 2015

My Research Journey : ULA to SMA


Data Model in Spatial Domain
I A sound field of L far-field sources with wavenumber k, is incident on a
microphone array of I microphones.
I In spatial domain, the sound pressure, p(k) = [p1(k), p2(k), . . . , pI (k)]T , is
written as,
p(k) = V(, k)s(k) + n(k),
(1)
I V(, k) is I L steering matrix, s(k) is L 1 vector of signal amplitudes,
n(k) is I 1 vector of zero mean, uncorrelated sensor noise.
I The steering matrix V(, k) is expressed as
V(, k) = [v1(1, k), v2(, k), . . . , vL(, k)], where
vl (l , k) = [e

jkT
l r1

jkT
l r2

,e

jkT
l rI T

,...,e

(3)

I kl = (k sin l cos l , k sin l sin l , k cos l )T , with l = /2 for ULA.


I ri = ((i 1)d, 0, 0)T for ULA and ri = (r cos i, r sin i, 0)T for UCA.
Speech Source Localization over Spherical Microphone Array

6 / 28

(2)

WISSAP 2015

My Research Journey : ULA to SMA


MUSIC and MUSIC-Group Delay Spectrum for Source Localization
I The MUSIC spectrum for source localization is given by
PM U SIC () =

(4)

vH ()Rpns[Rpns]H v()

Rpns is noise subspace obtained from eigenvalue decomposition of autocorrelation matrix, Rp = E[p(k)p(k)H ].
I MUSIC-Group delay spectrum is given by
PM GD () = (

U
X

|arg(v().qu)|2).PM U SIC ()

(5)

u=1

I U = I L, is the gradient operator, arg(.) indicates unwrapped phase,


and qu represents the uth eigenvector of the noise subspace, Rpns.
Speech Source Localization over Spherical Microphone Array

7 / 28

WISSAP 2015

My Research Journey : ULA to SMA


MUSIC Magnitude and MUSIC phase Spectrum : ULA and UCA
MP

4000

5
0
5

100
MUSIC Magnitude

2000

80
60

0
100

Ele() 40

50
Ele()
1

40

20

60

80

100

120

180

160

140

20
0

120

140

160

180

1
MP

0.5
0
0

100

80

60

40

20

10

20

30

40

Azi()

50

60

70

80

0.5
0
0

90

(a)

10

20

30

40

Azi()

50

60

70

80

90

(b)

(a) Spectral magnitude of MUSIC for UCA (top) and ULA (bottom). (b)Spectral phase of MUSIC for
UCA (top) and ULA (bottom). Sources at (15,50) and (20,60) for UCA. Sources at 50 and 60
for ULA.

Speech Source Localization over Spherical Microphone Array

8 / 28

WISSAP 2015

My Research Journey : ULA to SMA


Group Delay and MUSIC-Group delay Spectrum : ULA and UCA

x 10

30
MUSICGroup Delay

Standard Group delay

20
10
0
100
50
Ele()
1

20

60

40

80

100

120

140

180

160

0.5
0
0

2
0
100
50
Ele() 0
1

20

40

80

60

100

120

180

160

140

Azimuth()

0.5
10

20

30

40

50

60

70

80

0
0

90

10

20

30

40

Azi()

Azi()

(a)

(b)

50

60

70

80

90

(a) Standard group delay spectrum of MUSIC for UCA (top) and ULA (bottom) (b) MUSIC-Group
delay spectrum for UCA (top) and ULA (bottom).
2

Kumar, L.; Tripathy, A.; Hegde, R.M., "Robust Multi-Source Localization Over Planar Arrays Using MUSICGroup Delay Spectrum," Signal Processing, IEEE Transactions on , vol.62, no.17, pp.4627,4636, Sept.1, 2014 doi:
10.1109/TSP.2014.2337271
Speech Source Localization over Spherical Microphone Array

9 / 28

WISSAP 2015

My Research Journey : ULA to SMA


Application in DSR
Z

Estimate
DOA

Compute
TDOA

DSR

S1 (40,19)
T60
T60
Methods CTM
(150ms) (250ms)
MGD
12.98
23.96
MONC MUSIC
9.2 14.21
26.01
BS-MUSIC
15.02
27.99
Speech Source Localization over Spherical Microphone Array

Train
FSB

10 / 28

S2 (30,15)
T60
T60
(150ms) (250ms)
11.99
23.58
13.78
25.56
15.22
27.32
WISSAP 2015

My Research Journey : ULA to SMA3


Spherical Microphone Array (SMA)
I The position vector of ith microphone is given as ri = (ra, i) where ra is
radius of the spherical array ands i = (i, i).

Far-field

Near-field

(a)

(b)

(a) Spherical microphone array : Eigenmike system (b)Near-field and far-field region around
spherical microphone array. The ith microphone is positioned at ri and lth source at rl .
3

Kumar, L.; Singhal, K.; Hegde, R.M., "Robust source localization and tracking using MUSIC-Group delay spectrum
over spherical arrays," CAMSAP 2013, vol., no., pp.304,307, 15-18 Dec. 2013
Speech Source Localization over Spherical Microphone Array

11 / 28

WISSAP 2015

Near-field Source Localization in SH Domain


Data Model in Spatial Domain

I Pressure at the ith microphone due to lth source is


|ri rl |
,
c

sl (ti (l ))
|ri rl |

with i(l ) =

where c is speed of sound.

I Total pressure at ith microphone amounts to be


pi(t) =

L
X
sl (t i(l ))

|ri rl |

l=1

(6)

+ ni(t).

I Taking Fourier transform, the Equation 6 turns out to be


pi(fq ) =

L
X
ej2fq i(l)
l=1

|ri rl |

sl (fq ) + ni(fq ), q=1, ,Q.

(7)

Kumar, L.; Singhal, K.; Hegde, R.M., "Near-field source localization using spherical microphone array," HSCMA 2014,
vol., no., pp.82,86, 12-14 May 2014
Speech Source Localization over Spherical Microphone Array

12 / 28

WISSAP 2015

Near-field Source Localization in SH Domain


Data Model in Spatial Domain
I Dropping q, the Equation 7 can be re-written in wavenumber domain as
pi(k) =

L
X
ejk|rirl|
l=1

|ri rl |

(8)

sl (k) + ni(k).

I In matrix form, the final near-field data model in spatial domain can be written as
p(k) = V()s(k) + n(k)
(9)
I The steering matrix V() is
(10)

V() = [v(1), v(2), . . . , v(L)], where


ejk|r1rl|
ejk|rI rl| T
v(l ) = [
,...,
]
|r1 rl |
|rI rl |

Speech Source Localization over Spherical Microphone Array

13 / 28

(11)

WISSAP 2015

Near-field Source Localization in SH Domain


Data Model in Spherical Harmonics Domain
I Monochromatic spherical wave solution for wave equation
written in spherical coordinates as
ejk|rirl|
|ri rl |

X
n
X

ejk|ri rl |
,
|ri rl |

bn(k, ra, rl )Ynm(l )Ynm(i)

can be

(12)

n=0 m=n

I bn(k, ra, rl ) is nth order near-field mode strength. It is related to far-field


mode strength bn(k, ra) as bn(k, ra, rl ) = j (n1)kbn(k, ra)hn(krl ).
I The far-field mode strength for open sphere (virtual sphere) and rigid sphere
[1] is given by
bn(k, r) = 4j njn(kr), open sphere
0

j
n (kra )
n
= 4j jn(kr) 0
hn(kr) , rigid sphere.
hn(kra)
Speech Source Localization over Spherical Microphone Array

14 / 28

WISSAP 2015

(13)
(14)

Near-field Source Localization in SH Domain


Near-field Criterion for SMA
50

Magnitude(dB)

n=0
n=1
n=2
n=3

50

n=4

100
Nearfield
Farfield

150
200
250 1
10

10

10

Kmax

Far-field and near-field mode strength for Eigenmike system. Near-field source is at rl = 1m and
order is varied from n = 0 (top) to n = 4 (bottom)

I The near-field criteria for spherical array is presented based on similarity of near-field mode strength (|bn(k, ra, rl )|) and far-field mode strength
(|bn(k, ra)|).
I The two functions start behaving in similar way at krl N , for array of
order N as shown in the Figure.
I Hence, near-field condition for spherical array turns out to be rN F
ra rl N
[2].
k
Speech Source Localization over Spherical Microphone Array

15 / 28

WISSAP 2015

N
k

and

Near-field Source Localization in SH Domain


Spherical Harmonics
I Ynm represents spherical harmonic of order n and degree m given by
s
Ynm(, )

(2n + 1)(n m)!

Pnm(cos)ejm.

4(n + m)!
0 n N, n m n

where Pnm are the associated Legendre function.


I Spherical harmonics plot : Y00, Y10, Y11

Speech Source Localization over Spherical Microphone Array

16 / 28

WISSAP 2015

(15)

Near-field Source Localization in SH Domain


Spherical Fourier Transform
I Assuming continuous distribution of pressure, the spherical Fourier transform (SFT) of received pressure pc(k, r, , ) at (r, , ), is given as [3]
Z

pnm(k, r) =
0

pc(k, r, , )[Ynm(, )] sin()dd

(16)

I Rewriting Equation 16 for discrete microphone array


pnm(k, r)
=

I
X

aipi(k, r, i)[Ynm(i)]

(17)

i=1

I In matrix form for all n and m, we have


pnm(k, r) = YH ()p(k, r, )

(18)

where = diag(a1, a2, , aI ) is matrix of sampling weights.


Speech Source Localization over Spherical Microphone Array

17 / 28

WISSAP 2015

Near-field Source Localization in SH Domain


Data Model in Spherical Harmonics Domain
I Substituting the expression for pressure from Equation 12 in Equation 11,
the steering matrix in Equation 10 can be written as
V() = Y()[B(r1)yH (1), , B(rL)yH (L)]

(19)

I Y() is I (N + 1)2 matrix. A particular ith row vector can be written as


y(i) = [Y00(i), Y11(i), Y10(i), Y11(i), . . . , YNN (i)].

(20)

I The (N + 1)2 (N + 1)2 matrix B(rl ) is given by


B(rl ) = diag(b0(k, ra, rl ), b1(k, ra, rl ), b1(k, ra, rl ), b1(k, ra, rl ), .., bN (k, ra, rl ))
(21)

Speech Source Localization over Spherical Microphone Array

18 / 28

WISSAP 2015

Near-field Source Localization in SH Domain


Data Model in Spherical Harmonics Domain
I Substituting (19) in (9), multiplying both side by YH () and utilizing Equation 17, the data model becomes
pnm(k, r) = YH ()Y()[B(r1)yH (1), , B(rL)yH (L)]s(k) + nnm(k)
(22)
I Orthogonality of spherical harmonics under spatial sampling suggests [4],
YH ()Y()
= I.
I The data model in spherical harmonics domain turns out to be
pnm(k) = [B(r1)yH (1), , B(rL)yH (L)]s(k) + nnm(k).

(23)

I Re-writing the data model in more compact way, we have


(24)

pnm(k) = Vnm(r, )s(k) + nnm(k)


Speech Source Localization over Spherical Microphone Array

19 / 28

WISSAP 2015

Near-field Source Localization in SH Domain


SH-MUSIC, SH-MGD, SH-MVDR
I The near-field spherical harmonics MUSIC spectrum can now be written as
PSHM U SIC (rs, s) =

(25)

vnmH Rpns[Rpns]H vnm

I The Spherical Harmonics MUSIC-Group delay (SH-MGD) spectrum is computed as


PSHM GD (rs, s) = (

U
X

|arg(vnmH .qu)|2).PM M

(26)

u=1

I The SH-MVDR spectrum for near-field source localization, is written as


PM V DR(rs, s) =

Speech Source Localization over Spherical Microphone Array

(27)

y(s)BH Rp1ByH (s)

20 / 28

WISSAP 2015

Near-field Source Localization in SH Domain

x : 60
Y : 0.06
z : 0.66

0.5

0
0.1

x : 55
Y : 0.08
z:1

0.5

0
0.1
0.08
20

0.04 0

40

Elevation( )

60

0.08
0.06
Range(m)

20

0.04 0

60

40

80

0.06
Range(m)

Elevation( )

SHMGD

0.5

0.5

20
0 0

40
20
Elevation()

60

x : 60
Y : 30
z : 0.96

0.5

0
80

80
60

80

x : 55
Y : 40
z:1

x : 60
Y : 30
z : 0.8

0
80

20

0.04 0

60
40

Elevation( )

(c)

x : 55
Y : 40
z:1

1
x : 60
Y : 30
z : 0.71

40
Azimuth()

0.08

(b)

x : 55
Y : 40
z:1

x : 60
Y : 0.06
z : 0.96

0
0.1

80

(a)
1

x : 55
Y : 0.08
z:1

0.5

SHMVDR

Range(m) 0.06

SHMUSIC

x : 60
Y : 0.06
z : 0.71

SHMVDR

x : 55
Y : 0.08
z:1

SHMGD

SHMUSIC

80

(d)

60
Azimuth()

40
20
0 0

(e)

20

40
Elevation()

60

80

60
40
Azimuth()

20
0 0

20

(f)

The sources are at (0.06m,60,30) and (0.08m,55,40) with SNR 10dB.

Speech Source Localization over Spherical Microphone Array

21 / 28

WISSAP 2015

60
40
Elevation()

80

Near-field Source Localization in SH Domain

Near-field Source Localization : 4D Scatter Plot for source at (0.06m,60,30)


55.3
49.8
0.063

44.3

Range(m)

0.062

38.8

0.061

33.3

X : 60
Y : 30X:Y: 6030
Z: 0.06
Z : 0.06

0.06

27.8

0.059

22.3

0.058
33

16.7
32

11.2
31
30

Azimuth()

29
28

Speech Source Localization over Spherical Microphone Array

27

57

58

22 / 28

59

60

Elevation()

61

62

63

5.72
0.2

WISSAP 2015

Near-field Source Localization in SH Domain


Cramr-Rao Bound Analysis
I The unknown parameter vector is = [r T T T ]T with r = [r1 rL]T =
[1 L]T and = [1 L]T .
I Based on Cramr-Rao bound (CRB) expression for far-field case5, CRB expression for near-field case, can be obtained using following Fisher information matrix (FIM) elements
H
H R1V
nm )
Fr = 2Re (RsVnm
Rp1VnmRs)T (V
nmr p

1
T
H
H
1
+ (RsVnmRp Vnmr ) (RsVnmRp Vnm )


H
nm )
H R1V
Rp1VnmRs)T (V
F = 2Re (RsVnm
nm
p


H
nm )
nm )T (RsVH R1V
Rp1V
+ (RsVnm
p
nm

(28)

(29)

I Other block of FIM can be written in similar way.


5

Kumar, L.; Hegde, R.M., "Stochastic Cramr-Rao Bound Analysis for DOA Estimation in Spherical Harmonics Domain," Signal Processing Letters, IEEE , vol.22, no.8, pp.1030-1034, Aug. 2015 doi: 10.1109/LSP.2014.2381361
Speech Source Localization over Spherical Microphone Array

23 / 28

WISSAP 2015

Near-field Source Localization in SH Domain

Cramr-Rao Bound Analysis


6

x 10

CRB(r)
CRB()
CRB()

CRB

1.5

0.5

0
10

7.5

Speech Source Localization over Spherical Microphone Array

2.5

0
SNR (dB)

24 / 28

2.5

7.5

10

WISSAP 2015

Near-field Source Localization in SH Domain

Experiments on Source Localization


I RMSE was found for sources at (0.06,30,45) and (0.08,40,50) for 100 iteration.
Comparison of the RMSE in (r, ) at known .

SNR (dB) S
S1
-10
S2
S1
-5
S2
S1
0
S2

SH-MGD
(0.001,0.4)
(0,0.4243)
(4.47e-04,0)
(4.9e-04,0)
(0,0)
(0,0)

Speech Source Localization over Spherical Microphone Array

SH-MUSIC
(0.001,0.2449)
(0.001,0.2)
(2.0e-04,0)
(0,0.1414)
(0,0)
(0,0)

25 / 28

SH-MVDR
(0.013,2.97)
(0.007,2.05)
(0.0028,1.0)
(0.0018,0.7071)
(0.001,0)
(4.0e-04,0)

WISSAP 2015

Conclusion

I MUSIC-Group delay based source localization has been presented for ULA,
UCA and SMA.
I Near-field source localization for simultaneous estimation of range and bearing, has been utilized for the fist time.
I Experiments on source localization is presented as RMSE.
I Near-field array processing using sparse recovery technique in SH domain,
will be dealt with in future.

Speech Source Localization over Spherical Microphone Array

26 / 28

WISSAP 2015

References
[1] E. G. Williams, Fourier acoustics: sound radiation and nearfield acoustical
holography. Access Online via Elsevier, 1999. 14
[2] E. Fisher and B. Rafaely, Near-field spherical microphone array processing with radial filtering, Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 19, no. 2, pp. 256265, 2011. 15
[3] J. R. Driscoll and D. M. Healy, Computing fourier transforms and convolutions on the 2-sphere, Advances in applied mathematics, vol. 15, no. 2, pp.
202250, 1994. 17
[4] B. Rafaely, Analysis and design of spherical microphone arrays, Speech
and Audio Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135143,
2005. 19

Speech Source Localization over Spherical Microphone Array

27 / 28

WISSAP 2015

Thank You
Lalan Kumar
lalank@iitk.ac.in
http://home.iitk.ac.in/~lalank/