You are on page 1of 165

Microphone Array Processing for Acoustic Source

Localization in Spatial and Spherical Harmonics Domain

A Thesis Submitted

In Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy
by

Lalan Kumar

To the

Department of Electrical Engineering

INDIAN INSTITUTE OF TECHNOLOGY KANPUR

March, 2015
Abstract

With increased computational power and evolution of compact device technology, microphone
arrays are being used in hand held devices like mobile phones to large scale defense equip-
ments. Source Localization is a central problem in microphone array signal processing, and
it becomes even more challenging in the presence of noise, reverberation and sensor array
ambiguities. In this thesis, novel methods for acoustic source localization are proposed in
spatial and spherical harmonics domain.
In the context of spatial domain signal processing, a high resolution method that utilizes
the phase of MUltiple SIgnal Classification (MUSIC), is proposed for far-field source local-
ization over planar array. This method computes the group delay of MUSIC, and is called
MUSIC-Group delay (MGD) method. The MUSIC-Group delay method is able to resolve
closely spaced sources with a minimal number of sensors in contrast to the standard MUSIC
method, even in a reverberant environment.
Signal processing in spherical harmonics domain provides ease of beampattern steering
and a unified formulation for a wide range of array configurations. Both far-field and near-
field source localization problems are addressed in the spherical harmonics (SH) domain. The
MUSIC-Group delay method is formulated in spherical harmonics domain (called SH-MGD),
to resolve the spatial ambiguity in planar array. A search-free algorithm, SH-root-MUSIC is
also proposed for azimuth only estimation of far-field sources.
A new data model is developed for near-field source localization in spherical harmonics
domain. In particular, three methods namely SH-MUSIC, SH-MGD and SH-MVDR, that
jointly estimate the range and bearing of multiple sources are proposed. The near-field
MVDR beampattern analysis is also performed to illustrate the significance of the proposed
method. Stochastic Cramér-Rao bound for far-field and near-field data model is formulated
in spherical harmonics domain to evaluate the location estimator. Several experiments on
3-D source localization are conducted in reverberant and noisy environments. Additionally,
experiments are also performed on real signal acquired over spherical microphone array in
anechoic chamber. The comparative performance of the proposed methods is presented in
terms of root mean square error, probability of resolution and average error distribution.

iii
Dedicated
To
My Spiritual Master,
His Holiness Radhanath Swami

iv
Acknowledgment

I take this opportunity to thank my counselor, Dr. Makarand Upkare, for turning direction of
my life to research and teaching. I would like to express my deepest gratitude to my advisor,
Dr. Rajesh M. Hegde, for tirelessly inspiring, motivating and guiding. Without his constant
support with all accommodative nature, this thesis would not have come to reality. His out of
box thinking, passion for research and timeliness made my research path smoother. His four
years of association has taught me many great values, which I will cherish throughout my
life. I am also thankful to Prof. Harish Karnick and Prof. Pradip Sircar for useful discussion
on various occasions.
I gratefully acknowledge the financial support from MHRD, Government of India (2010-
2011) and Tata Consultancy Services (TCS) under TCS research scholarship program (2011-
2015). The travel support for national and international conference from Government of
India, TCS and IIT Kanpur gave me opportunities to explore the world around.
I would also like to extend my appreciation to all my MiPS labmates. In particular, I
thank Ardhendu, Kushagra, Waquar, Karan, Sudhir, Sandeep, Sachin and Shreyan whose
presence made this journey full of fun and learning. In addition, I am grateful to Ishtiyaq
Husain and Mr. Narendra Singh who provided all support needed in my research.
I am indebted to my father S.L. Baranwal and my mother Late Shanti Devi for giving me
everything. I am equally indebted to all my brothers Ashok, Nakul and Sunil for supporting
me in everyway. I am particularly thankful to my wife Mrs. Deepshikha and little son
Madhav, for bearing my late nights in the lab. I am grateful to her for being so tolerant and
patient about my situation.
At last, I would like to thank my spiritual master HH Radhanath Maharaj and Lord
Krishna who arranged all these.

v
Contents

List of Figures xi

List of Tables xvi

List of Symbols xvii

List of Abbreviations xix

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement and Research Objectives . . . . . . . . . . . . . . . . . . 3
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Principles of Sound Wave Propagation for Source Localization 7


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The Spherical Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Acoustic Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Solution to Wave Equation in Cartesian Coordinates . . . . . . . . . . . . . . 10
2.4.1 Plane Wave Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Spherical Wave Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 Solution to Wave Equation in Spherical coordinates . . . . . . . . . . . . . . 13
2.5.1 Plane Wave Solution for Rigid Sphere . . . . . . . . . . . . . . . . . . 16
2.5.2 Spherical Wave Solution for Rigid Sphere . . . . . . . . . . . . . . . . 18

vi
2.5.3 Range Criterion for Near-field and Far-field in Source Localization . . 19
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Microphone Array Signal Processing Techniques 21


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Geometry of Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Uniform Linear Array . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Uniform Circular Array . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Spherical Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Microphone Array Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Spatial Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 Acoustic Noise and Reverberation . . . . . . . . . . . . . . . . . . . . 29
3.4 Acoustic Source Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Correlation-based Source Localization . . . . . . . . . . . . . . . . . . 31
3.4.1.1 Source Localization using Plain Time Correlation . . . . . . 32
3.4.1.2 Source Localization using Generalized Cross-correlation . . . 33
3.4.2 Beamforming-based Source Localization . . . . . . . . . . . . . . . . . 34
3.4.2.1 Delay-and-Sum Beamforming . . . . . . . . . . . . . . . . . . 35
3.4.2.2 Capon Beamforming . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.2.3 Beampattern Analysis . . . . . . . . . . . . . . . . . . . . . . 38
3.4.3 Subspace-based Source Localization . . . . . . . . . . . . . . . . . . . 39
3.4.3.1 The MUSIC Method . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.3.2 Computing MUSIC Spectrum from Sample Covariance Matrix 42
3.4.3.3 The MUSIC-Group Delay Method . . . . . . . . . . . . . . . 44
3.4.3.4 The MUSIC-Group Delay Method using Shrinkage Estimators 45
3.4.3.5 The root-MUSIC Method . . . . . . . . . . . . . . . . . . . . 46
3.5 Wideband Source Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 MUSIC-Group Delay Method for Source Localization over Planar Micro-


phone Array 51

vii
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization . . . 52
4.2.1 Music-Group Delay Method for Source Localization over Planar Array 52
4.2.2 Spectral Analysis of the MUSIC-Group Delay Function under Rever-
berant Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.3 Two-dimensional Additive Property of the MUSIC-Group Delay Spec-
trum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Localization Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Performance under Sensor Perturbation Error . . . . . . . . . . . . . . 60
4.3.2 Cramér-Rao Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.3 Source Localization Error Analysis under Reverberant Environments . 64
4.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 Experimental Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.2 Experiments on Speech Enhancement in Multi-source Environment . . 66
4.4.3 Experiments on Perceptual Evaluation of Enhanced Speech . . . . . . 68
4.4.4 Experiments on Distant Speech Recognition . . . . . . . . . . . . . . . 69
4.5 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Far-field Source Localization over Spherical Microphone Array 71


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Fundamentals of Spherical Array Processing . . . . . . . . . . . . . . . . . . . 72
5.2.1 The Spherical Fourier Transform . . . . . . . . . . . . . . . . . . . . . 72
5.2.2 Beampattern Analysis in Spherical Harmonics Domain . . . . . . . . . 74
5.3 Microphone Array Data Model in Spherical Harmonics Domain . . . . . . . . 76
5.3.1 Data Model in Spatial Domain . . . . . . . . . . . . . . . . . . . . . . 76
5.3.2 Data Model in Spherical Harmonics Domain . . . . . . . . . . . . . . 78
5.4 Advantage of Array Data Model Formulation in Spherical Harmonics Domain 79
5.4.1 Reduced Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.2 Frequency Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.3 Ease of Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

viii
5.5 Far-field Source Localization using Spherical Microphone Array . . . . . . . . 81
5.5.1 Spherical Harmonics MVDR Method . . . . . . . . . . . . . . . . . . . 82
5.5.2 Spherical Harmonics MUSIC Method . . . . . . . . . . . . . . . . . . 83
5.5.3 Spherical Harmonics MUSIC-Group Delay Method . . . . . . . . . . . 83
5.5.4 Noise Whitening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.6 Formulation of Stochastic Cramér-Rao Bound for Far-field Sources . . . . . . 85
5.6.1 Existence of the Stochastic CRB in Spherical Harmonics Domain . . . 86
5.6.2 CRB Analysis in Spherical Harmonics Domain . . . . . . . . . . . . . 87
5.7 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.7.1 Experiments on Far-field Source Localization in Noisy Environments . 89
5.7.2 Experiments on Far-field Source Localization in Reverberant Environ-
ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.7.3 Statistical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7.4 Experiments on narrowband source Tracking . . . . . . . . . . . . . . 91
5.8 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6 The Spherical Harmonics root-MUSIC 95


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Formulation of root-MUSIC in Spherical Harmonics Domain . . . . . . . . . . 96
6.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3.1 Experiments on Source Localization . . . . . . . . . . . . . . . . . . . 100
6.3.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7 Near-field Source Localization over Spherical Microphone Array 103


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.2 Formulation of Near-field Array Data Model in Spherical Harmonics Domain 104
7.2.1 Near-field Data model in Spatial Domain . . . . . . . . . . . . . . . . 104
7.2.2 Near-field Data model in Spherical Harmonics Domain . . . . . . . . . 106
7.3 Near-field Source Localization in Spherical Harmonics Domain . . . . . . . . 108
7.3.1 Spherical Harmonics MUSIC for Near-field Source Localization . . . . 109

ix
7.3.2 Spherical Harmonics MUSIC-Group Delay Method for Near-field Source
Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.3.3 Spherical Harmonics MVDR Method for Near-field Source Localization 110
7.4 The Near-field MVDR Beampattern Analysis . . . . . . . . . . . . . . . . . . 111
7.5 Cramér-Rao Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.6.1 Experiments on Near-field Source Localization . . . . . . . . . . . . . 115
7.6.1.1 RMSE Analysis of Range Estimation . . . . . . . . . . . . . 115
7.6.1.2 Statistical Analysis of Range Estimation . . . . . . . . . . . 116
7.6.2 Experiments on Joint Range and Bearing Estimation . . . . . . . . . . 117
7.6.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 117
7.6.2.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 118
7.6.3 Experiments on Interference Suppression using Near-field MVDR Beam-
forming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.7 Summary and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8 Conclusions and Future Directions 123


8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Appendices 126

A Stochastic Cramér-Rao Bound in Spherical Harmonics Domain 127


A.1 Formulation of Fisher Information Matrix . . . . . . . . . . . . . . . . . . . . 128
A.2 Computing the Derivative of Spherical Harmonics Function Ynm . . . . . . . . 130

B Computing the Derivative of Near-field Steering Matrix 131

References 133

Publications Related to Thesis Work 145

x
List of Figures

2.1 Spherical coordinate system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8


2.2 Diagram illustrating general time delay estimation from a traveling plane wave. 11
2.3 Illustration of a traveling spherical wave and associated time delay estimation. 13
2.4 Spherical harmonics plot, Y00 , Y10 , Y11 . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Variation of mode strength bn in dB as a function of kr and n for an open
sphere. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Plot showing the nature of far-field and near-field mode strength for the Eigen-
mike system. Near-field source is at rl = 1m and order is varied from n = 0
to n = 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Uniform Linear Array geometry. . . . . . . . . . . . . . . . . . . . . . . . . . 22


3.2 Front back ambiguity in ULA. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Uniform circular array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Photograph of a spherical microphone array : The Eigenmike system. . . . . 24
3.5 Illustration of various regions in a typical room impulse response (RIR). . . . 30
3.6 Voiced frame of a speech signal of length 512 samples, original signal (top) and
signal delayed by 40 samples (bottom). . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Plain time correlation plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 Generalized cross-correlation (GCC), GCC-Roth and GCC-PHAT plots (top
to bottom) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 Beamformer block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.10 Delay-and-sum beamformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xi
3.11 DOA estimation using (a) DSB and (b) MVDR method. A ULA with I = 10
microphones was used for sources located at 20◦ and 60◦ . . . . . . . . . . . . 37
3.12 Delay-and-sum beampattern for ULA with no spatial aliasing for I = 10,
φs = 90◦ and d = 0.5λ (a) in Cartesian coordinates and (b) in polar coordinates. 38
3.13 Delay-and-sum beampattern for ULA under aliasing for I = 10, φs = 90◦ and
d = 2λ (a) in Cartesian coordinates and (b) in polar coordinates. . . . . . . . 39
3.14 Illustration of Delay-and-sum beampattern for UCA with I = 10, Ψs =
(45◦ , 90◦ ), under no spatial aliasing (a) in spherical coordinate system (b)
in rectangular coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.15 MUSIC-Magnitude spectrum for DOA 60◦ and 65◦ using 5 sensors (top) and
for 15 sensors (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.16 MUSIC, Unwrapped phase (of MUSIC) and MUSIC-Group delay spectra for
two sources with azimuth (a) 60◦ and 65◦ , (b) 50◦ and 60◦ . . . . . . . . . . . 45
3.17 Eigenvalue estimation using sample covariance and shrinkage estimator using
10 Sensors for 3 Sources, located at 20◦ , 35◦ and 50◦ . . . . . . . . . . . . . . . 46
3.18 The MUSIC-Magnitude spectrum (Top), the MUSIC-GD spectrum (Middle),
and the MUSIC-GD spectrum with shrinkage estimation (bottom) using 6
sensors for closely spaced sources located at 20◦ and 25◦ , at DRR=20dB. . . 47
3.19 Z-Plane representation of all the roots of root-MUSIC polynomial using 8
sensors for 2 sources with locations 40◦ and 50◦ . . . . . . . . . . . . . . . . . 48

4.1 Spectral magnitude of MUSIC for UCA (top) and ULA (bottom). Sources at
(15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA. . . . . . . 53
4.2 Spectral phase of MUSIC for UCA (top) and ULA (bottom). Sources at
(15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA. . . . . . . 54
4.3 Illustration of standard group delay of MUSIC and the MUSIC-Group delay
as proposed in this work. (a) Standard group delay spectrum of MUSIC for
UCA (top) and ULA (bottom) (b) MUSIC-Group delay spectrum for UCA
(top) and ULA (bottom). Sources are at (15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA, at
50◦ and 60◦ for ULA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

xii
4.4 Plots illustrating azimuth and elevation angle as estimated by (a) MUSIC-
Magnitude and (b) MUSIC-Group delay spectrum for sources at (15◦ ,100◦ ) and
(17◦ ,105◦ ), reverberation time 400 ms. MM estimates single peak at (18◦ ,105◦ ).
MGD estimates two peak at (19◦ ,100◦ ) and (17◦ ,108◦ ). . . . . . . . . . . . . . 56
4.5 Two dimensional spectral plots for the cascade of two individual DOAs (res-
onators), (a) Source with DOA (15◦ ,60◦ ) (b) Source with DOA (18◦ ,55◦ ) (c)
MUSIC-Magnitude spectrum (d) MUSIC-Group delay spectrum. . . . . . . . 59
4.6 Contour plots of (a) MUSIC-Magnitude spectrum (b) MUSIC-Group delay
spectrum, under sensor perturbation errors. . . . . . . . . . . . . . . . . . . . 62
4.7 Two dimensional scatter plot for localization for the sources at (10◦ ,20◦ ) and
(5◦ ,10◦ ) using (a) MUSIC-Magnitude method and (b) MUSIC-Group delay
method. Reverberation time is 150 ms. SNR is 40 dB. Number of iteration is
500. The red dot indicates the actual DOA. . . . . . . . . . . . . . . . . . . . 64
4.8 Experimental Setup in meeting room with two speakers (S1 and S2) and two
interference (stationary noise source SN and nonstationary noise source NS).
Sources are located at (17◦ ,35◦ ), (19◦ ,40◦ ), (15◦ ,30◦ ) and (21◦ ,45◦ ) respectively.
Radius of the circular array is 10 cm. . . . . . . . . . . . . . . . . . . . . . . . 65
4.9 Flow diagram illustrating the methodology followed in performance evaluation
for distant speech signal acquired over circular array. . . . . . . . . . . . . . . 66

5.1 Computation of spherical Fourier transform over sphere with radius r = 1 . . 73


5.2 Illustration of the spherical harmonics beampatterns (a) regular beampattern
for order N = 3, (b)regular beampattern for order N = 4 (c) DSB beampattern
for order N = 3 and, (d) DSB beampattern for order N = 4 . . . . . . . . . . 75
5.3 SH-MVDR spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB . . . 82
5.4 SH-MUSIC spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB . . 83
5.5 SH-MGD spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB . . . . 84
5.6 Variation of CRB for elevation (θ) and azimuth (φ) estimation (a) at various
SNR with 300 snapshots, (b) with varying snapshots at SNR 20dB. Source is
located at (20◦ , 50◦ ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

xiii
5.7 Cumulative RMSE in source angle estimation at various SNRs for two hundred
iterations. The sources are located at (30◦ , 35◦ ) and (50◦ , 60◦ ). . . . . . . . . 90
5.8 Trajectory of elevation angle (θ) followed by the moving source with time for
a fixed azimuth φ = 45◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.9 Tracking result for elevation (a)SH-MUSIC and (b) SH-MGD. The azimuth is
fixed at 45◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.10 Average error distribution plot for tracking error using SH-MUSIC and SH-
MGD Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.1 Plot of SH-MUSIC illustrating DOA estimation using fourth order Eigenmike
system. Sources are located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB. . . . 97
6.2 Plot of SH-root-MUSIC illustrating the actual DOA estimates (red stars) and
noisy DOA estimates (blue triangles). A fourth order Eigenmike system is
used. Sources are located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB. . . . . . 99
6.3 Probability of resolution plot for two sources with azimuth (40◦ , 80◦ ) and co-
elevation 20◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.1 Illustration of Near-field and far-field regions around spherical microphone


array. The ith microphone is positioned at ri and lth source at rl . . . . . . . . 105
7.2 Illustration of range and elevation estimation by (a) SH-MUSIC method (b)
SH-MGD method (c) SH-MVDR method for fixed azimuth. Illustration of
elevation and azimuth estimation using (d) SH-MUSIC method (e) SH-MGD
method (f) SH-MVDR method for fixed range. The sources are at (0.06m,60◦ ,30◦ )
and (0.08m,55◦ ,40◦ ) at an SNR of 10dB. . . . . . . . . . . . . . . . . . . . . . 110
7.3 Cramér-Rao bound analysis at various SNR, (a) for random signal (b) for
sinusoidal signal. The source location is (0.08m, 40◦ , 50◦ ). . . . . . . . . . . . 114
7.4 Range estimation performance of SH-MGD, SH-MUSIC and SH-MVDR in
terms of probability of resolution. . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.5 The Eigenmike setup in an anechoic chamber at IIT Kanpur for acquiring
near-field sources. A near-field source is placed at (0.3m, 90◦ , 90◦ ). . . . . . . 117

xiv
7.6 Four dimensional scatter plots using, (a) SH-MUSIC for simulated signal, (b)
SH-MGD for simulated signal, (c) SH-MUSIC for signal acquired over SMA
(d) SH-MGD for acquired over SMA. A narrowband source with frequency
600Hz, located at (0.3m, 90◦ , 90◦ ) is considered. . . . . . . . . . . . . . . . . . 118
7.7 Illustration of near-field MVDR beampattern. The desired source is at (0.1m, 50◦ , 30◦ ),
and interfering source at (0.3m, 55◦ , 40◦ ). . . . . . . . . . . . . . . . . . . . . 119
7.8 Radial filtering analysis of the proposed near-field MVDR method over a spher-
ical microphone array. (a) Array gain for fixed r = 0.1m. (b) Array gain for
fixed r = 0.3m. (c) Array gain for fixed θ = 30◦ . (d) Array gain for fixed
θ = 40◦ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

xv
List of Tables

4.1 Comparison of average RMSE of various methods with the CRB (illustrated in
the first row) for an azimuth range of 10◦ -150◦ and elevation range of 10◦ -80◦
at T 60 of 200 ms and SNR 10dB. . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Enhancement in SIR (dB), compared for various methods at different rever-
beration time. S1s is the desired speaker, S2s is the competing speaker, S ns is
non-stationary noise source and S sn is stationary noise source. . . . . . . . . 67
4.3 Comparison of perceptual evaluation results using various methods. The re-
sults are compared based on objective measure. . . . . . . . . . . . . . . . . . 68
4.4 Comparison of distant speech recognition performance in terms of WER (in
percentage) at various reverberation time, T60 . . . . . . . . . . . . . . . . . . 69

5.1 Comparison of RMSE of various methods at different reverberation time (T60 ). 90


5.2 Probability of resolution at various SNRs for 200 iterations. Sources are taken
at (30◦ , 35◦ ) and (50◦ , 60◦ ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.1 Comparison of RMSE of various source localization methods at different SNR 100

7.1 Cumulative RMSE in range r, at various SNRs for 100 iterations. Sources are
at (0.1m, 30◦ , 45◦ ) and (0.8m, 30◦ , 45◦ ). . . . . . . . . . . . . . . . . . . . . . . 116

xvi
List of Symbols

(.)∗ Complex conjugate of scalar (.)


(.)T Transpose of vector or matrix (.)
(.)H Conjugate transpose of matrix or vector (.)
a Steering vector in spatial domain
anm Steering vector in spherical harmonics domain
A Steering matrix in spatial domain
Anm Steering matrix in spherical harmonics domain
B Mode strength matrix
bn (k, r) nth order far-field mode strength
bn (k, r, rl ) nth order near-field mode strength
∗ Convolution
c Speed of sound
d Distance between two consecutive microphones in ULA
h1n (kr), hn (kr) Spherical Hankel function of first kind
h2n (kr) Spherical Hankel function of second kind
i Microphone index
I Number of Microphones
j Unit imaginary number
jn Spherical Bessel functions of the first kind
Jn Bessel function of the first kind
k Wavenumber
k Wavevector
l Source index
L Number of sources
m Degree (of spherical harmonics)
n Order (of spherical harmonics or mode strength)
N Order of spherical microphone array
Ns Number of snapshots
p(r, θ, φ, t) Pressure in space-time domain
P (r, θ, φ, ω) Pressure in frequency domain
Pnm Associated Legendre functions
Pnm Spherical Fourier transform of P
q Noise eigenvector
Qn Noise subspace in spatial domain
Qnm Noise subspace in spherical harmonics domain
Rp Array Covariance matrix
RDnm /RPnm Modal covariance matrix
r Position vector
ra Radius of circular/spherical microphone array
s Source signal amplitude in time domain
S Source signal amplitude in frequency domain
T60 Reverberation time
v Sensor noise
Ts Sampling period
yn Spherical Bessel functions of the second kind
Yn Bessel function of the second kind
Ynm Spherical harmonics of order n and degree m
C The set of all complex numbers
∇ Gradient
τ Time delay
Ψ Angular location of a source
Φ Angular location of a microphone
θ Elevation angle
φ Azimuth angle
λ Wavelength
ζ Confidence interval

xviii
List of Abbreviations

2-D Two-dimensional
3-D Three-dimensional
AED Average Error Distribution
BSM Beamspace MUSIC
CRB Cramér-Rao Bound
CTM Close Talk Microphone
DFT Discrete Fourier Transform
DOA Direction of Arrival
DRR Direct to Reverberant energy Ratio
DSB Delay-and-Sum Beamformer
DSR Distant Speech Recognition
ESPRIT Estimation of Signal Parameters using Rotational Invariance Techniques
FFT Fast Fourier Transform
FIM Fisher Information Matrix
FSB Filter Sum Beamformer
GCC Generalized Cross Correlation
IDFT Inverse Discrete Fourier Transform
LCMV Linearly Constrained Minimum Variance
MM MUSIC Magnitude
MUSIC MUltiple SIgnal Classification
MGD MUSIC-Group Delay
MVDR Minimum Variance Distortionless Response
PDF Probability Density Function
PHAT Phase Transform
RIR Room Impulse Response
RMSE Root Mean Square Error
SH Spherical Harmonics
SIR Signal to Interference Ratio
SINR Signal to Interference plus Noise Ratio
SMA Spherical Microphone Array
SMGD MUSIC-Group Delay using Shrinkage Estimator
SNR Signal to Noise ratio
TDOA Time Delay Of Arrival
STFT Short Time Fourier Transform
UCA Uniform Circular Array
ULA Uniform Linear Array
WER Word Error Rate

xx
Chapter 1

Introduction

Microphone array consists of a set of microphones arranged in various geometries to capture


the spatial information of a sound source. The spatial-temporal information available at the
output of the microphone array can be used to estimate various source parameters or extract
the intended source signal. This has many daily life applications like localization and tracking
of multiple sources, estimation of number of sources, noise reduction, echo cancellation and
dereverberation. Array signal processing techniques are utilized in all such applications. The
underlying array signal processing is capable of providing promising solutions to our day to
day problems because of following reasons [1].

• Its capability of enhancing the signal to noise ratio (SNR) of noise-corrupted signal

• Array can be used as spatial filter to selectively allow signal from one direction while
rejecting from all other directions, known as beamforming

• Beam can be electronically steered by appropriate delay in each channel signal, without
having to point the array physically

With the above three features, microphone arrays are widely used for direction of arrival
(DOA) estimation and speech enhancement for telecommunication and robotics. Source
localization refers to azimuth and elevation estimation of far-field sources. For near-field
sources, it refers the estimation of azimuth, elevation and range.
1.1 Motivation 2

1.1 Motivation

Source localization has application in acoustic scene analysis, speech separation, distant
speech recognition, surveillance, assisted living environments and hands-free communication.
Hence, it has been an active area of research. Various algorithms for source localization has
been proposed. Minimum Variance Distortionless Response (MVDR) [2] and Multiple SIg-
nal Classification (MUSIC) [3] are the most popular nonparametric and parametric methods
respectively. The MUSIC method is widely studied due to its high resolution and computa-
tional efficiency. The MUSIC method utilizes magnitude spectrum that gives large number
of spurious peaks or single peak for closely spaced sources with limited number of sensors.
The group delay spectrum has been used widely in temporal frequency processing for its high
resolution properties [4, 5]. However, group delay has hitherto not been utilized for spatial
spectrum analysis. Investigating MUSIC-Group delay (MGD) spectrum for high resolution
DOA estimation for various array configurations, provided the initial motivation.
In literature on source localization, various microphone array configurations have been
used, which includes linear, planar and three-dimensional (3-D) arrays. Source localization
problem formulation using uniform linear array (ULA) is simple. However, it can locate
sources in a plane and exhibits front-back ambiguity. Complexity increases with planar
arrays, like the uniform circular array (UCA), but it can localize sources anywhere in space
above the plane of the array. Spherical microphone array (SMA), on the other hand, can
localize sources anywhere in the space with no spatial ambiguity.
Initial work on spherical array can be found in [6], where an approach using spherical
harmonic expansion to design beampatterns for continuous spherical arrays is discussed.
Some other work on discrete spherical array can be found in [7, 8, 9]. However, these works
are related to antenna arrays and the techniques used are from linear arrays.
A general approach using spherical harmonics (SH) signal processing was proposed in
[10, 11] for spherical microphone array utilizing pressure sensors. After the introduction
of spherical harmonics signal processing, spherical microphone array research has attracted
the attention of the researchers. In spherical harmonics domain, the formulation of source
localization problem and beamforming become simple with reduced complexity. The ability
1.2 Problem Statement and Research Objectives 3

of spherical microphone arrays to measure and analyze 3-D sound fields in an effective manner
and ease of signal processing in spherical harmonics domain has motivated the researchers.
Most of the source localization algorithms utilizing spherical microphone array, were proposed
in last five years [12, 13, 14, 15, 16, 17]. Spherical microphone array signal processing has
thus become an active area of research. It has extensive applications in three dimensional
sound reception, sound field analysis, teleconferencing, direction of arrival estimation and
noise control application. Source localization forms an integral part of these applications.
Hence, significant part of the thesis focuses on source localization using spherical microphone
array.

1.2 Problem Statement and Research Objectives

In conventional MUSIC method for source localization, the MUSIC magnitude spectrum
is used. Phase spectrum of MUSIC has been utilized for source localization in [18] for
ULA. The phase spectrum is found to be more robust and has a high resolution. However,
negative differential of the unwrapped phase spectrum of MUSIC (group delay of MUSIC)
is unexplored for source localization. It is to be noted that group delay function has been
widely used in temporal frequency processing for its high resolution and additive properties
[4, 5]. The additive property of the group delay function for spatial spectrum analysis was
first discussed in [19]. In this thesis, MUSIC-Group delay spectrum is utilized for source
localization over planar and spherical microphone array.
Numerous algorithms have been proposed in literature for accurate and search-free DOA
estimation using ULA and UCA. root-MUSIC [20] and Estimation of Signal Parameters us-
ing Rotational Invariance Techniques, ESPRIT [21], fall under this category. Estimating
DOA from roots of MUSIC polynomial is possible because of Vandermonde structure of ar-
ray manifold. This is made possible in UCA for azimuth only estimation using beamspace
transformation [22]. Utilizing manifold separation for Vandermonde structure in array man-
ifold and hence azimuth estimation using root-MUSIC in spherical harmonics domain, needs
to be investigated.
All of the source localization algorithms developed in spherical harmonics domain, deal
1.3 Organization of the Thesis 4

with far-field sources only. Far-field assumption greatly simplifies the source localization
and beamforming approach. However, applications involving microphone array generally
require assumption of near-field sources. In applications like Close Talk Microphone (CTM),
teleconferencing, hands-free telephony and voice-only data entry, the signal source is well
within the near-field range of the array. Using the far-field assumption in the near-field
of an array can result in severe degradation in array performance [23]. Also, important
spatial information is lost. Near-field criteria for spherical microphone array was formally
formulated in [24] in terms of range of the near-field sources. However, a formal data model for
simultaneous estimation of range and bearing in spherical harmonics domain is not available
in literature. Hence, there is a need for the development of near-field data model and source
localization algorithms in spherical harmonics domain.
The performance of any localization estimator is evaluated against Cramér-Rao bound
(CRB). CRB places a lower bound on the variance of an unbiased estimator. Hence, it is also
of sufficient interest to develop an expression for Cramér-Rao bound in spherical harmonics
domain.
Based on above discussion, the thesis aims to develop novel methods for source localization
over planar and spherical microphone arrays. Additionally, novel methods for near-field and
far-field source localization over spherical microphone array are proposed. Cramér-Rao bound
formulation and analysis is also presented.

1.3 Organization of the Thesis

The thesis comprises of eight chapters. The thesis is divided into three parts. Chapter 2
and chapter 3 give a background on wave propagation and spatial array signal processing
respectively. Chapter 4 proposes novel source localization algorithms in spatial domain. The
latter part of the thesis focuses on signal processing in spherical harmonics domain. The rest
of the thesis is organized as follows.

• Chapter 2 : provides an overview of wave propagation theory. The coordinate system


utilized throughout the thesis and solution to wave equation is provided in this chapter.
The solution to wave equation is given in Cartesian and spherical coordinate system
1.3 Organization of the Thesis 5

for both plane wave and spherical wave case. Scattering from rigid sphere, spherical
harmonics, near-field and far-field mode strength are introduced herein.

• Chapter 3 : The fundamentals of array signal processing are discussed here. The
spatio-temporal array data model is derived from first principles of physics. Various
source localization methods are described for linear, planar and spherical microphone
array.

• Chapter 4 : A robust and high resolution method based on group delay of MUSIC,
called MUSIC-Group delay, using planar array is proposed here. Additive property of
group-delay spectrum of MUSIC is proved. A discussion on the high resolution of the
method is presented. Various experiments are conducted to illustrate its significance.

• Chapter 5 : introduces signal processing in spherical harmonics domain. Far-field


source localization algorithms are discussed in spherical harmonics domain. In particu-
lar, MUSIC-Group delay is proposed in spherical harmonics domain, called SH-MGD.
SH-MGD spectrum is utilized for source localization and tracking. Cramér-Rao bound
formulation and analysis is also presented for the proposed methods.

• Chapter 6 : proposes spherical harmonics root-MUSIC for azimuth estimation of


sources. Manifold separation is utilized herein for showing Vandermonde structure in
array manifold. The SH-root-MUSIC is a search free algorithm which estimates DOA
by computing the roots of SH-MUSIC polynomial.

• Chapter 7 : proposes new methods for near-field source localization using spherical
microphone array. A new data model for near-field source localization is presented.
Three methods that jointly estimate the range and bearing of multiple sources in the
spherical array framework namely SH-MUSIC, SH-MGD and SH-MVDR, are proposed.
Cramér-Rao bound formulation and analysis in spherical harmonics domain for near-
field sources is also presented.

• Chapter 8 : draws conclusions from the methods proposed in the thesis. Future
direction is also detailed herein.
1.4 Summary of Contributions 6

• Appendix : Appendix A.1 presents detailed formulation of CRB in spherical harmon-


ics domain. Appendix A.2 computes the derivative of spherical harmonics. Appendix
B gives derivative of near-field steering matrix in spherical harmonics domain.

1.4 Summary of Contributions

This thesis contributes to the body of knowledge on array signal processing both in spatial
and spherical harmonics domain. The specific contributions are
1. Far-field Source Localization in Spatial Domain [25]: The negative differential
of the unwrapped phase spectrum (group delay) of MUSIC for DOA estimation over
planar arrays is proposed. In particular, MUSIC-Group delay spectrum is utilized
for robust source localization using uniform circular array. Although the group delay
function has been used widely in temporal frequency processing for its high resolution
properties [4], the additive property of the group delay function has hitherto not been
utilized in spatial spectrum analysis. The significance of additive property in the context
of DOA estimation is throughly studied.
2. Far-field Source Localization in Spherical Harmonics Domain [14]: Two
methods for far-field source localization are proposed in spherical harmonics domain.
SH-MGD utilizes the advantages of the MUSIC-Group delay spectrum in a spheri-
cal harmonics framework. A search free algorithm, SH-root-MUSIC, which estimates
DOA by computing the roots of SH-MUSIC polynomial, is also proposed for azimuth
estimation of far-field sources.
3. Near-field Source Localization in Spherical Harmonics Domain [13] : A new
data model for near-field source localization is formulated in the spherical harmonics
domain. SH-MUSIC, SH-MVDR and SH-MGD are proposed for joint estimation of
range and bearing of the sources.
4. Formulation and Analysis of Cramér-Rao Bound in Spherical Harmonics
Domain [26]: Expressions for stochastic Cramér-Rao bound is formulated in spherical
harmonics domain for both far-field and near-field sources. The existence of CRB for
the spherical harmonics data model is first verified. Subsequently, an expression for
stochastic CRB is derived in spherical harmonics domain.
Chapter 2

Principles of Sound Wave


Propagation for Source Localization

2.1 Introduction

The focus of this thesis is localization of sound sources in space by utilizing microphone array.
Microphone array is used to measure acoustic wavefield and extract spatial information about
the sources. Array signal processing algorithms depend on accurate characterization of wave
through solution to wave equation. A brief discussion on acoustic wave equation and its
solution is provided in this Chapter. Solution to wave equation governs the propagation of
acoustic wave in a medium. The planar and spherical wave propagation models are extensively
utilized in this thesis. This chapter starts with coordinate system which will be utilized for
defining the location of an acoustic source and point of observation.

2.2 The Spherical Coordinate System

In this thesis, spherical coordinate system is utilized. The coordinate system is illustrated in
Figure 2.1. A position vector is denoted by r = (r, θ, φ)T where (.)T denotes the transpose
operator. The range r of a source, can vary from 0 to ∞ and is measured from the origin.
The angle θ is referred as elevation angle and is measured down from positive z axis. The
azimuthal angle φ is measured counterclockwise from positive x axis. The range of θ and
2.2 The Spherical Coordinate System 8

φ are [0, π] and [0, 2π] respectively. In this thesis, location of a source is represented as
rl = (rl , Ψl ), with Ψl = (θl , φl ). The location of a receiver is denoted as ri = (ri , Φi ), where
Φi = (θi , φi ).
Z

Figure 2.1: Spherical coordinate system.

The spherical coordinate of a point (r, θ, φ)T in space is related to right-handed Cartesian
coordinates (x, y, z)T by simple trigonometric formulae as

x = r sin (θ) cos (φ), (2.1)

y = r sin (θ) sin (φ), (2.2)

z = r cos (θ). (2.3)

The position vector r in Cartesian coordinate is given by


� �T
r = xî + y ĵ + z k̂ = r sin θ cos φ r sin θ sin φ r cos θ (2.4)

where î, ĵ and k̂ are unit vectors in direction of x-axis, y-axis and z-axis. In the rest of the
thesis, a vector will be represented by a column matrix. Although there are other coordinate
systems, spherical coordinate system will be used prominently in this thesis.
A propagating wave in space-time domain will be represented by p(r, θ, φ, t). For acoustic
wave propagation, p(r, θ, φ, t) ≡ p(r, t) represents infinitesimal variation of acoustic pressure
at position r and time t. The pressure is governed by acoustic wave equation, discussed in
the ensuing Section.
2.3 Acoustic Wave Equation 9

2.3 Acoustic Wave Equation

In array signal processing, sound information is propagated through the medium to reach the
array. The hydrostatic pressure field is generated by means of propagation of compression
and rarefaction of the sound waves.
For a traveling wave, the pressure will be function of space and time. Let the infinitesimal
variation of acoustic pressure from its equilibrium be represented by p(r, t). In a source free
homogeneous fluid with no viscosity, the pressure satisfies the following wave equation [27]

1 ∂ 2 p(r, t)
∇2 p(r, t) − = 0, (2.5)
c2 ∂t2

where ∇2 represents the Laplacian operator and c is the speed of sound wave propagation
in a particular medium. At 20◦ C, value of c is 343 m/s in air and 1481 m/s in water.
The derivation of wave equation in Equation 2.5 is a result of basic laws of physics and
conservation of mass. In general, p(r, t) represents a scalar field. The same equation is
valid for electromagnetic field derived from Maxwell’s equations. p(r, t) in this case, would
represent the electric field E.
The wave equation can also be re-written in frequency domain by applying Fourier trans-
formation of Equation 2.5. Hence, the wave equation in frequency domain can be expressed
as [27]
ω
∇2 P (r, ω) + ( )2 P (r, ω) = 0, (2.6)
c
ω
where ω = 2πf is the temporal frequency and k = c is termed as wavenumber. The wave
equation in 2.6 is also known as Helmholtz equation (or reduced wave equation). Wavenumber
k can thus be expressed as
ω 2π 2πf
k= = = . (2.7)
c λ c
where λ is wavelength of arriving wave. In the ensuing Section, a solution to the wave
equation is described.
2.4 Solution to Wave Equation in Cartesian Coordinates 10

2.4 Solution to Wave Equation in Cartesian Coordinates

In this Section, solution to wave equation in Cartesian coordinates is presented. The Equation
2.5 can be written in Cartesian coordinates as
∂2p ∂2p ∂2p 1 ∂2p
2
+ 2 + 2 = 2 2. (2.8)
∂x ∂y ∂z c ∂t
Perpendicular to direction of propagation of wave, the wave is spread out to form a wave-
front. The initial shape of the wavefront is generally considered to be arbitrary that evolves
according to the mathematics involved in solution to the wave equation. In particular, planar
and spherical wavefronts are studied. In the following Sections, both the planar and spherical
wave solutions are discussed.

2.4.1 Plane Wave Solution

Mathematical model of plane wave propagation is given by the solution to the partial differ-
ential Equation 2.8. For a plane wave, value of p(r, t0 ) at given instant t0 , is constant over
all points on a plane, perpendicular to the direction of propagation as shown in Figure 2.2.
Let us consider a source situated in far-field region with direction denoted by (θl , φl ). Hence,
the plane wave travels in direction given by (π − θl , π + φl ). The expression for wavevector
is given as
� �T
k = k sin θ cos φ k sin θ sin φ k cos θ . (2.9)

where k is in direction of propagation with magnitude k, the wavenumber. Replacing (θ, φ)


in Equation 2.9 with (π − θl , π + φl ), the expression for wavevector becomes
� �T
kl = − k sin θl cos φl k sin θl sin φl k cos θl . (2.10)

The opposite sign of wavenumber indicates that the wavevector in the new definition, will be
used for direction of arrival and not for wave propagation as shown in Figure 2.2.
Assuming propagation delay at a reference point to be zero, the delay at receiver ri is
denoted by τi (Ψl ). The delay τi (Ψl ), from geometry shown in Figure 2.2, can be calculated
as
dil
τi (Ψl ) = = k̂l .ri /c = kl .ri /ω (2.11)
c
2.4 Solution to Wave Equation in Cartesian Coordinates 11

Travelling plane wave at time t1

At time t2

Travelling plane wave at Receiver

At reference

Direction of propagation
Source direction

Receiver

(0,0,0)

Figure 2.2: Diagram illustrating general time delay estimation from a traveling plane wave.

where position vector ri and wave vector kl are given by Equations 2.4 and 2.10 respectively.
Equation 2.11 can also be written as

ωτi (Ψl ) = kTl ri . (2.12)

In order to discuss the plane wave solution to wave equation, an arbitrary field of the
form below is considered.
p(r, t) = f (t − k̂l .r/c) (2.13)
k̂l
The Equation 2.13 will satisfy the wave equation, where the term c is also known as slowness
vector. It is to be noted that plane wave has meaning only for a single frequency. Hence,
monochromatic plane wave solution to the wave equation at frequency ω, can be written as
[28, 29]
p(r, t) = Aej(ωt−kx x−ky y−kz z) (2.14)

where kx , ky , kz , ω are real constants, A is a complex constant and j is unit imaginary


� �T
number. The wave vector kl = kx ky kz , consists of spatial frequencies. Each spatial
frequency denotes 2π times the number of cycles per meter of the monochromatic plane wave
2.4 Solution to Wave Equation in Cartesian Coordinates 12

in x, y and z directions. Inserting Equation 2.14 in Equation 2.8, a constraint is received,


given by
ω2
kx2 + ky2 + kz2 = . (2.15)
c2
The constraint has to be satisfied by all monochromatic wave solutions. The plane wave
solution in Equation 2.14, can also be written in a compact way as

T r)
p(r, t) = Aej(ωt−kl . (2.16)

Taking the Fourier transform of Equation 2.13, the solution can be written in frequency
domain as
T
P (r, ω) = e−kl .r F (ω) = e−kl r F (ω) (2.17)

where F (ω) is temporal spectrum (frequency dependent part) [30]. This is also called solution
to Helmholtz equation.

2.4.2 Spherical Wave Solution

The spherical waves are generated when sound is emitted equally in all direction (spherical
symmetry) from the center of a sphere, or sound source is highly localized. A traveling
spherical wave is illustrated in Figure 2.3 for a near-field source located at rl . Assuming
spherical symmetry of the wave, the acoustic pressure p will be function of radial distance
and time but not of the angular coordinates. In this case, the Laplacian operator can be
converted to [31, p. 20-12],
∂2 2 ∂
∇2 = 2
+ (2.18)
∂r r ∂r
Hence, the wave equation in 2.5 takes the form as

∂ 2 p 2 ∂p 1 ∂2p
+ = . (2.19)
∂r2 r ∂r c2 ∂t2

This can also be simplified to a more convenient form as

1 ∂2 1 ∂2p
(rp) = . (2.20)
r ∂r2 c2 ∂t2

Hence, the final spherical wave equation is given by

∂2 1 ∂2
(rp) = (rp). (2.21)
∂r2 c2 ∂t2
2.5 Solution to Wave Equation in Spherical coordinates 13

(0,0,0) M

Figure 2.3: Illustration of a traveling spherical wave and associated time delay estimation.

If the product rp is considered as a single term, the solution can be written similar to plane
wave solution in Equations 2.13. Hence, the spherical wave model is represented by

f (t − k̂l .r/c)
p(r, t) = . (2.22)
r

Utilizing Equation 2.11 and the geometry of the near-field shown in Figure 2.3, the time
delay can be computed as
|ri − rl |
τ = k̂l .r/c = . (2.23)
c
Utilizing the Equations 2.22 and 2.23, the acoustic pressure at a point ri due to a source at
rl is given by
1
p(ri , t) = f (t − |ri − rl |/c) (2.24)
|ri − rl |
Taking the temporal Fourier transform of Equation 2.24, the final solution to wave equation
can be written as
e−k|ri −rl |
P (r, ω) = F (ω). (2.25)
|ri − rl |
Similar to Equation 2.16, monochromatic solution to spherical wave is given by

A j(ωt−kT r)
p(r, t) = e l . (2.26)
r

2.5 Solution to Wave Equation in Spherical coordinates

In this Section, solution to the wave equation in spherical coordinate system is described.
The time dependent wave equation in Equation 2.5 can be written in spherical coordinates
2.5 Solution to Wave Equation in Spherical coordinates 14

as
1 ∂ 2 ∂p 1 ∂ ∂p 1 ∂2p 1 ∂2p
(r ) + (sin (θ) ) + = . (2.27)
r2 ∂r ∂r r2 sin (θ) ∂θ ∂θ r2 sin2 (θ) ∂φ2 c2 ∂t2
The solution to Equation 2.27 is given using separation of variables as

p(r, θ, φ, t) = R(r)Θ(θ)Φ(φ)T (t). (2.28)

This leads to following differential equations.


1 d2 Φ(φ)
= −m2 (2.29)
Φ(φ) dφ2
d dΘ(θ)
sin θ (sin θ ) + [n(n + 1) sin2 θ − m2 ]Θ(θ) = 0 (2.30)
dθ dθ
d 2 dR(r)
(r ) + [k 2 r2 − n(n + 1)]R(r) = 0 (2.31)
dr dr
1 1 d2 T (t)
= −k 2 (2.32)
T (t) c2 dt2
Solution to Equation 2.32 is
T (t) = T1 ejωt + T2 e−jωt . (2.33)

The first solution is taken for the time dependence, with T2 = 0 since e−jωt represents a wave
propagating backward in time and hence, has no significance [29].
Solutions to Equations 2.29 and 2.30 are combined into single function, called spherical
harmonics [27]. The spherical harmonics Ynm are defined by

2n + 1 (n − m)! m
Ynm (θ, φ) = P (cos θ)ejmφ . (2.34)
4π (n + m)! n
Here n is a non-negative integer and is called order of the spherical harmonics. m is termed
as degree of the spherical harmonics that takes values in −n ≤ m ≤ n. Pnm is the associated
Legendre function of first kind. The constant term in spherical harmonics makes the spherical
harmonics function orthonormal. The constant term arises from orthogonal properties of
Legendre functions Pnm (cos θ) and exponential functions ejmφ [32, p. 38]. For negative m, the
spherical harmonics take the form as

Ynm (θ, φ) = (−1)|m| Yn|m|∗ (θ, φ). (2.35)

Due to orthonormality of spherical harmonics following relation holds,


� 2π � π

Ynm (θ, φ)Ynm� ∗ (θ, φ) sin θdθdφ = δnn� δmm� (2.36)
0 0
2.5 Solution to Wave Equation in Spherical coordinates 15

where the Kronecker symbol δmn is defined as



 1 for m = n
δmn � (2.37)
 0 otherwise

The spherical harmonics act as basis function for spherical harmonics decomposition of a
square integrable function, similar to complex exponential ejωt acting as basis for decompo-
sition of real periodic functions [33]. Figure 2.4 shows the plot of three spherical harmonics.
The radius shows the magnitude and color indicates the phase. It is to be noted that Y00 is
isotropic while Y10 and Y11 have directional characteristics.

Figure 2.4: Spherical harmonics plot, Y00 , Y10 , Y11 .

Solution to differential equation in radial coordinate i.e. Equation 2.31, is obtained by


transforming it into spherical Bessel’s equation [27, p. 193]. Hence, the solutions are given as

R(r) = R1 jn (kr) + R2 yn (kr) (2.38)

where jn (kr) and yn (kr) are spherical Bessel functions of the first kind and second kind,
respectively. Alternatively, solution can also be written using spherical Hankel function as
follows.
R(r) = R3 h1n (kr) + R4 h2n (kr) (2.39)

where h1n (kr) and h2n (kr) are spherical Hankel function of first and second kind respectively.
It may be noted that

h1n (kr) ∝ ejkr

represents an outgoing wave, where

h2n (kr) ∝ e−jkr


2.5 Solution to Wave Equation in Spherical coordinates 16

represents incoming wave. In the rest of the thesis, hn (kr) will be used for spherical Hankel
function of first kind, h1n (kr). The spherical Bessel and Hankel functions are related to Bessel
and Hankel functions as
π 1/2
jn (x) ≡ ( ) Jn+1/2 (x) (2.40)
2x
π
yn (x) ≡ ( )1/2 Yn+1/2 (x) (2.41)
2x
π 1/2
h1n (x) ≡ jn (x) + jyn (x) = ( ) [Jn+1/2 (x) + jYn+1/2 (x)] (2.42)
2x
π
h2n (x) ≡ jn (x) − jyn (x) = ( )1/2 [Jn+1/2 (x) − jYn+1/2 (x)] (2.43)
2x
where j is unit imaginary number. Jn+1/2 (.) is the half odd integer order Bessel function of
the first kind and Yn+1/2 (.) is half odd integer order Bessel function of the second kind (also
known as Neumann function).
Finally a general solution to Equation 2.27 with ejωt implicit, can be written as
∞ �
� n
� �
P (r, θ, φ, ω) = Amn jn (kr) + Bmn yn (kr) Ynm (θ, φ) (2.44)
n=0 m=−n

for standing wave type solution and


∞ �
� n
� �
P (r, θ, φ, ω) = Cmn h1n (kr) + Dmn h2n (kr) Ynm (θ, φ) (2.45)
n=0 m=−n

for traveling wave solution, where the co-efficients Amn , Bmn , Cmn and Dmn are generally
complex valued. Note that the solutions given by Equations 2.44 and 2.45 are frequency
domain representation of p(r, θ, φ, t), where the temporal dependency is implicit in frequency
dependence of the co-efficients [27, p. 186].

2.5.1 Plane Wave Solution for Rigid Sphere

In this Section, plane wave propagation is studied in presence of scattering. Scattering prob-
lems concern propagation of wave that collide with some object. In particular, we consider
scattering of plane waves from a rigid sphere. This is because most of the thesis is cen-
tered around spherical microphone array processing where acoustic sensors (microphones)
are embedded on a rigid sphere.
A rigid spherical microphone array with radius ra and I microphones is taken into con-
sideration. For plane wave propagation (far-field scenario), finding pressure becomes interior
2.5 Solution to Wave Equation in Spherical coordinates 17

problem, in which case the only spherical Bessel function is included from the general solu-
tion in Equation 2.44. Let us consider a far-field source incident from direction (θl , φl ) at
a point ri = (r, θi , φi ), with r ≥ ra . As discussed in Section 2.4.1, the wave propagates in
opposite direction denoted by (θp , φp ) = (π − θl , π + φl ). Hence, the pressure on an open
sphere (imaginary) due to unit amplitude plane wave, can be written in terms of spherical
harmonics as [27, p. 227],

� n

Tr
ejk i
= 4π j n jn (kr) [Ynm (θp , φp )]∗ Ynm (θi , φi ). (2.46)
n=0 m=−n

The Equation 2.46 can be derived as in [34], and it is called Jacobi-Anger expansion. The
relations between spherical harmonics at point of opposite direction suggests [35]

Ynm (π − θ, π + φ) = (−1)n Ynm (θ, φ). (2.47)

From Equations 2.9 and 2.10, it is clear that sign of wavenumber changes when we use
direction of arrival in place of direction of propagation. Hence, utilizing Equations 2.9, 2.10,
2.47 and reflection formula of spherical Bessel function, jn (−z) = (−1)n jn (z), pressure in
Equation 2.46 can be re-written as

� n

−jkT
e l ri = 4π n
j jn (kr) [Ynm (θl , φl )]∗ Ynm (θi , φi ). (2.48)
n=0 m=−n

The pressure in equation 2.48 represents pressure without any scatterer in place. In this
case the microphone array is called open sphere, where just the microphones are placed at I
locations. However, for the case of rigid sphere, resultant of incident and scattered pressure
should be taken into consideration. The pressure due to scattering is exterior problem and
hence, solution will include h1n (kr) from the general solution in Equation 2.45. Utilizing the
boundary condition of zero radial velocity for rigid sphere, the pressure in Equation 2.48 can
be re-written as [27],

Tr

� � j � (kra ) � �
n
e−jkl i
= 4π j n jn (kr) − n� hn (kr) [Ynm (θl , φl )]∗ Ynm (θi , φi ). (2.49)
hn (kra ) m=−n
n=0

Combining the Equations 2.48 and 2.49, the plane wave model in spherical coordinate can
be written as,
∞ �
� n
Tr
e−jkl i
= bn (k, r)[Ynm (θl , φl )]∗ Ynm (θi , φi ) (2.50)
n=0 m=−n
2.5 Solution to Wave Equation in Spherical coordinates 18

where bn (k, r) is called far-field mode strength. It is given by

bn (k, r) = 4πj n jn (kr), open sphere


� j � (kra ) �
= 4πj n jn (kr) − n� hn (kr) , rigid sphere (2.51)
hn (kra )
where r ≥ ra . Figure 2.5 illustrates mode strength bn as a function of kr and n for an
open sphere. For kr = 0.1, zeroth order mode amplitude is 22 dB, while the first order
has amplitude −8 dB. Hence, for order greater than kr, the mode strength bn decreases
significantly. Therefore, the summation in Equation 2.50 can be truncated to some finite
N ≥ kr, called the array order.
40

20 n=0

0
n=1
b (kr) in dB

−20
n=2
−40
n

−60 n=3

−80
n=4

−100

−120 −1 0 1
10 10 10
kr

Figure 2.5: Variation of mode strength bn in dB as a function of kr and n for an open sphere.

2.5.2 Spherical Wave Solution for Rigid Sphere

Similar to plane wave, spherical wave also be expanded in terms of spherical harmonics
using Jacobi-Anger expansion [34]. Looking at Equations 2.25 and 2.50, the pressure at ith
microphone due to lth unit amplitude source located at rl , is given in terms of spherical
harmonics as [24, 36]
N n
e−jk|ri −rl | � �
= bn (k, ra , rl )Ynm (θl , φl )∗ Ynm (θi , φi ). (2.52)
|ri − rl | m=−n n=0

where bn (k, ra , rl ) is nth order near-field mode strength. It is related to far-field mode strength
bn (k, ra ) as

bn (k, ra , rl ) = j −(n−1) kbn (k, ra )hn (krl ) (2.53)


2.5 Solution to Wave Equation in Spherical coordinates 19

The far-field mode strength bn (k, r) is given in Equation 2.51. These final expressions for
plane wave and spherical wave in terms of spherical harmonics will be used later to derive
some useful results.

2.5.3 Range Criterion for Near-field and Far-field in Source Localization

In this Section, the criterion for near-field and far-field source localization based on the range
is discussed. Spherical wavefronts are assumed when sources are in near-field regions. On
the other hand, plane waves are assumed when sources are in far-field. The near-field and
far-field criterion, in general, is determined by the Fresnel and Fraunhofer distances [37]. The
near-field Fresnel region is defined by

D3 2D2
0.62 < rl < (2.54)
λ λ

where D is array aperture and rl is distance of source from the array. The region defined
2D 2
by rl > λ corresponds to far-field Fraunhofer region. However, these parameters do not
indicate the near-field range of spherical microphone array.

50 n=0

n=1
0
n=2
Magnitude(dB)

n=3
−50
n=4

−100
Near−field
−150 Far−field

−200

−250 −1 0 1 Kmax
10 10 10
k

Figure 2.6: Plot showing the nature of far-field and near-field mode strength for the Eigenmike
system. Near-field source is at rl = 1m and order is varied from n = 0 to n = 4.

The near-field criteria for spherical array is presented in [38], based on similarity of near-
field mode strength (|bn (k, ra , rl )|) and far-field mode strength (|bn (k, ra )|). The two functions
start behaving in a similar manner at krl ≈ N , for array of order N . This is illustrated in
Figure 2.6 for rigid sphere Eigenmike system [39] with rl = 1m and order varying from n = 0
2.6 Summary 20

to n = 4. Hence the near-field condition for spherical array turns out to be

N
rN F ≈ . (2.55)
k

But rN F ≥ ra , hence for a source to be in near-field, the range of the source should satisfy

kmax
ra ≤ rl ≤ ra (2.56)
k
N
with kmax = ra .

2.6 Summary

In this chapter, the spherical coordinate system and wave propagation model is described.
The solution to plane wave and spherical wave is provided in Cartesian and spherical co-
ordinate system. Spherical harmonics and mode strength are introduced along with their
significance in the context of this thesis. These concepts play a significant role in solution to
the source localization problem. It is also utilized in formulation of steering vector. Concepts
of open sphere and rigid sphere is also discussed along with the definition of the near-field
criterion for spherical microphone array.
Chapter 3

Microphone Array Signal


Processing Techniques

3.1 Introduction

Microphone arrays utilize a large number of microphones to exploit the additional spatial
information available. The signal acquired over microphone array is spatially sampled by the
microphones, thus generating diversity in the space domain in terms of time delays. This is
similar to traditional digital signal processing where diversity is present in the time domain.
The signal is sampled at different time instants here. This time sampling allows to design a
FIR filter to select particular frequencies. On the other hand, the spatial sampling allows the
design of a spatial filter to pass sources from certain directions while rejecting sources from
other directions. This spatial filtering technique is also called beamforming.
In the previous Chapter, propagation of sound wave and a solution to the wave equation
was discussed. In this Chapter, we express the pressure due to far-field sources, received
at microphone array in form of a data model. The data model is derived from first prin-
ciples of physics. A brief discussion on commonly used source localization methods is also
provided using uniform linear arrays. They can be broadly divided as covariance-based,
beamforming-based and subspace-based source localization. In the following section, various
array geometries are introduced first.
3.2 Geometry of Microphone Array 22

3.2 Geometry of Microphone Array

Microphones can be arranged in various geometries to acquire the acoustic signal. Linear,
planar and spherical microphone arrays are widely studied. Although linear microphone
array is simple in structure and processing, it is limited by front-back ambiguity. Planar
arrays overcomes the front-back ambiguity, however, it suffers from up-down ambiguity. The
spherical microphone array can localize sources anywhere in space with no spatial ambiguity.
The usefulness and the limitations of uniform linear array, planar array (in particular, circular
array) and spherical microphone array, are detailed in this Section.

S1

M0 M1
Y d

X
M0 M1 M2 M3

d S2

Figure 3.1: Uniform Linear Array geometry. Figure 3.2: Front back ambiguity in ULA.

3.2.1 Uniform Linear Array

The simplest array configuration is that of a uniform linear array (ULA). Figure 3.1 shows a
ULA with four microphones placed uniformly on x-axis. The distance between two consecu-
tive microphone is d. A far-field source is incident on the array at an azimuthal angle φ. The
extra distance traveled by the wavefront between two consecutive microphones is d cos(φ).
It is to be noted that configuration and hence localization problem formulation for an ULA
is simple. However, it suffers from front-back ambiguity. This is also called north-south am-
biguity. The front-back ambiguity is illustrated in Figure 3.2. It can be noticed from Figure
3.2 that a ULA can localize sources only in its own plane with azimuth ranging in [0, π]. Also,
3.2 Geometry of Microphone Array 23

it can not differentiate between the two positions S1 and S2. It means ULA is capable of
estimating the incident angle with x-axis, however, it is unable to locate which side around
the x-axis. This is called front-back ambiguity.

3.2.2 Uniform Circular Array

In a uniform circular array (UCA), microphones are placed uniformly in a circular fashion
as shown in Figure 3.3. A UCA can localize sources with any azimuth, i.e. φ ∈ [0, 2π] and
elevation ranging from 0 to π/2. Although circular array does not suffer from front-back
ambiguity, it is limited by up-down ambiguity [40]. Another advantage is that UCA is much
more compact than ULA for same number of microphones and spatial aliasing condition.
Z

Figure 3.3: Uniform circular array.

3.2.3 Spherical Microphone Array

The spherical microphone array, can localize sources anywhere in the space. Hence, the
spherical microphone array is capable of measuring and analyzing three dimensional sound
field in an effective manner. It is more compact than the UCA. An Eigenmike system [39]
is a spherical microphone array with 32 microphones embedded on rigid sphere of radius 4.2
cm. An Eigenmike system from mh-acoustics is shown in Figure 3.4.
3.3 Microphone Array Data Model 24

Figure 3.4: Photograph of a spherical microphone array : The Eigenmike system.

3.3 Microphone Array Data Model

In this Section, data model for signals acquired over a microphone array is discussed. A
few assumptions are made to make the formulation analytically tractable. The sources are
assumed in far-field of the array. The transmission medium is assumed to be isotropic and
non-dispersive. These assumptions allow a straight line propagation model. The sources are
assumed to be narrowband. The narrowband signal assumption is discussed first, prior to
the development of the array data model.
The complex envelope representation of narrowband signal is given as [41],

s(t) = x(t)ej(ωc t+y(t)) (3.1)

where x(t) represents the complex envelope of the signal. x(t) and y(t) are slowly varying
functions of time that defines the amplitude and phase of s(t), and ωc is known center
frequency. Narrowband assumption implies

x(t − τ ) ≈ x(t) and y(t − τ ) ≈ y(t) (3.2)

for all possible propagation delays τ through the array elements. It is reasonable to assume
that the envelope does not change significantly, as it traverses from reference point through
the array. In frequency domain,

X(ω)e−jωτ ≈ X(ω) and Y (ω)e−jωτ ≈ Y (ω). (3.3)


3.3 Microphone Array Data Model 25

It can be noted from Equation 3.3 that for narrowband condition to hold, the product of
frequency and group delay for amplitude envelope has to be negligible. Similarly, the prod-
uct of frequency and phase delay for phase envelope should be negligible. Mathematically,
following condition should be satisfied [42] for narrowband assumption.

ωτ � 1 (3.4)

Assuming slow varying nature of amplitude and phase as suggested in Equation 3.2, the
delayed signal in Equation 3.1 can be written as

s(t − τ ) ≈ s(t)e−jωc τ (3.5)

From the Equation 3.5, it may be noted that the effect of time delay on received waveform
is simply a phase shift.
Now we consider an arbitrary microphone array with I identical and omnidirectional
microphones. The position vector of ith microphone is given by ri = (ri , Φi )T where Φi =
(θi , φi ) is the angular location and (.)T denotes the transpose of (.). A narrowband sound field
of L plane-waves is incident on the array. The direction of arrival of the lth source is denoted
by Ψl = (θl , φl ). For planar wavefront (far-field case), the instantaneous pressure amplitude
at the ith microphone due to lth source is sl (t − τi (Ψl )), where τi (Ψl ) is delay of arrival at
ith microphone w.r.t. some reference point, for the lth source sl (t). Note that the reference
point can be any point in space or one of the microphones. However, in general practice,
the reference point is taken to be the array centroid. The total pressure at ith microphone
amounts to be [21, 43]
L
� � �
pi (Ψ; t) = αi (Ψl )sl t − τi (Ψl ) + vi (t) (3.6)
l=1

where αi (Ψl ) is the temporal Green’s function of ith sensor for lth source and vi is uncorrelated
sensor noise component. The noise is assumed to be baseband additive white Gaussian noise
of power σ 2 . The data model in Equation 3.6 is known as anechoic data model [44].
Utilizing the narrowband approximation result in Equation 3.5, the pressure at ith mi-
crophone can be re-written as
L

pi (Ψ; t) = αi (Ψl )sl (t)e−jωc τi (Ψl ) + vi (t) (3.7)
l=1
3.3 Microphone Array Data Model 26

Utilizing omnidirectional assumption, αi (Ψl ) = 1∀ i, l, and Equation 2.12, the pressure at


the ith microphone in Equation 3.7, can be written as
L
� Tr
pi (Ψ; t) = sl (t)e−jkl i
+ vi (t) (3.8)
l=1

Taking Ns number of snapshots, Equation 3.8 can be re-written in matrix form as

p(t) = A(Ψ, k)s(t) + v(t) , t = 1, 2, · · · , Ns (3.9)

where p(t) = [p1 (t), p2 (t), . . . , pI (t)]T , A(Ψ, k) is I × L steering matrix (also called array
manifold), s(t) = [s1 (t), s2 (t), · · · , sL (t)]T is matrix of signal amplitudes at the reference point
and v(t) is baseband additive white Gaussian sensor noise. The steering matrix A(Ψ, k) is
expressed as
� �
A(Ψ, k) = a(Ψ1 , k) a(Ψ2 , k) . . . a(ΨL , k) (3.10)

where a particular steering vector can be written as


� �T
a(Ψl , k) = e−jkTl r1 e−jkl
Tr
2 . . . e−jkl
Tr
I . (3.11)

The steering vector is also called the array manifold vector in literature. Both these phrases
are used interchangeably in the thesis. Utilizing the Equation 2.12, the steering vector ex-
pression, can also be written as collection of phase shifts given by
� �T
al (Ψl , k) = e−jωc τ1 (Ψl ) e−jωc τ2 (Ψl ) · · · e−jωc τI (Ψl ) (3.12)

The Equation 3.9 is referred as spatio-temporal narrowband data model in most general form.
This can be utilized for any array configuration.
For an ULA shown in Figure 3.1, the position vector of ith microphone can be given as
� �T
ri = (i − 1)d 0 0 . (3.13)

It is to be noted that the elevation angle θ is 90◦ for an ULA. Therefore, the wavevector from
Equation 2.10, can be as
� �T
kl = − k cos φl k sin φl 0 (3.14)
3.3 Microphone Array Data Model 27

Hence, utilizing Equation 2.12, the propagation delay at the ith microphone can now be
written as
−(i − 1)d cos φl
τi (Ψl ) = , i = 1, 2, · · · , I. (3.15)
c
Utilizing this in Equation 3.12, the steering vector for ULA takes the form as
� �T
al (φl , k) = 1 ejkd cos φl ej2kd cos φl · · · ej(I−1)kd cos φl (3.16)

ωc
where k = c . It can be observed that steering vector for ULA exhibits Vandermonde
structure.
For UCA shown in Figure 3.3, the wavevector will remain same as in Equation 2.10.
Microphones are uniformly distributed to form circular array with radius ra . Noting that
the elevation angle θ is 90◦ for all the microphones over UCA, the position vector of ith
microphone is given as
� �T
ri = ra cos φi , ra sin φi , 0 (3.17)

Hence, the steering vector for UCA is given as Equation 3.12, with propagation delay as

−ra sin θl cos (φl − φi )


τi (Ψl ) = . (3.18)
c

The steering matrix for spherical microphone array is discussed in Chapter 5. A detailed
derivation of steering matrix has been provided in spherical harmonics domain in Section 5.3.

3.3.1 Spatial Aliasing

A minimum sampling frequency (called Nyquist frequency) is required to avoid aliasing in


the time sampled signal [45]. For microphone arrays, the signal is spatially sampled using
microphones. A similar condition on spatial sampling frequency exists for sensor arrays.
Nyquist sampling theorem suggests

1
fs = ≥ 2fmax (3.19)
Ts

where f s is sampling frequency, Ts is sampling period and fmax is maximum frequency


component in frequency spectrum of signal. Similarly, for spatial sampling using ULA, we
have the requirement [46],
1
f xs = ≥ 2fxmax (3.20)
d
3.3 Microphone Array Data Model 28

where fxs is spatial sampling frequency in samples per meter, d is spatial sampling period and
fxmax is the highest spatial frequency component present in spatial spectrum of the signal.
The spatial frequency (number of cycles per meter) in x-axis is given by

sin θ cos φ
f xs = . (3.21)
λ

Hence, maximizing numerator and minimizing denominator yields fxmax as below

1
fxmax = . (3.22)
λ

Utilizing Equations 3.20 and 3.22, the Nyquist condition for alias free spatial sampling is
given by
λ
d≤ (3.23)
2
This can be interpolated as Nyquist sampling theorem in spatial domain.
For UCA, a similar condition is found for spacing between two consecutive array elements.
However, the Nyquist sampling is elucidated using phase mode excitation of UCA [47, 48].
The highest phase mode (M ) that can be excited in a UCA with I elements is given as

I ≥ 2M + 1 (3.24)

with M = kra . This reduces the aliasing in beampattern. The condition in Equation 3.24
can be simplified to [48]
dcir ≤ λ/2 (3.25)

2πra
where dcir = I is circumferential spacing between adjacent array elements.
For spherical microphone array of order N and radius ra , there is no significant spatial
aliasing when working in the range N ≥ kr [49]. The terms are elaborated in Section 2.5.1.
It is to be noted that spatial sampling theorem is formulated with respect to a narrowband
signal. The wideband nature of speech sounds allows increase of the microphone array spacing
beyond the Nyquist limit without suffering the aliasing artifacts [50]. Additionally, the effect
of spatial aliasing is observed as false peak in the spatial spectrum. In beampattern plots,
the effect is seen as introduction of grating lobes in visible range of array. This is illustrated
in Figure 3.13.
3.3 Microphone Array Data Model 29

3.3.2 Acoustic Noise and Reverberation

The intelligibility of signal acquired over microphone array is affected by noise and rever-
beration. The additive acoustic noise is undesired external disturbances, present as sound
events. This can be observed during the silence periods of the source signal. The multi-path
propagation phenomenon of sound waves is called reverberation. The effect of reverberation
can be seen as smearing of the speech in spectrogram and time domain waveform [51].
The acoustic noise does not refer to concrete statistical, frequency, spatial or propagation
characteristics. Hence, the noise could be either stationary or nonstationary. Also, the noise
could be directional (an interfering speech source) or non-directional (background noise).
The noise may also be thermal noise generated by the sensor circuitry. This sensor noise and
background noise are considered to be spatially white. The noise at different microphones
are assumed to be uncorrelated. The noise is also considered to be uncorrelated with the
desired sources. The noise used in this thesis will be sensor noise with noise variance σ 2 .
Reverberation arises because of multi-path propagation. The data model in Equation
3.6 takes only the direct-path into account. However, in practice, signal propagation follows
multi-path in reverberant environment. Hence, the recorded signal consists of contributions
from direct and multi-path. Due to multi-path propagation, the sound persists in space even
after original sound from the source has vanished. The duration for which the sound exists
up to a minimum audible range, is called reverberation time. In particular, it is measured as
T60 , which is defined as time required for a sound in room to decay by 60dB [52].
The data model presented in Equation 3.6 includes only direct-path and is not valid in case
of reverberation. The received signal at the ith microphone under reverberant environments
is given by [44]
L

pi (t) = hil ∗ sl (t) + wi (t) , t = 1, 2, · · · , Ns (3.26)
l=1

where hil is the room impulse response (RIR) between ith microphone and lth source. The
∗ symbol denotes the convolution. The impulse response in a room consists of direct sound,
early reflections and late reflections [51] as illustrated in Figure 3.5.
The initial region in room impulse response with nearly zero amplitude is followed by a
peak. This region corresponds to direct-path propagation. The amplitude of the peak due
3.3 Microphone Array Data Model 30

0.04 n
d
Impulse Response

0.03

0.02

0.01

−0.01
0 0.05 0.1 0.15 0.2 0.25
Early Time(s)
Reflection Late Reflection

T
60

Figure 3.5: Illustration of various regions in a typical room impulse response (RIR).

to direct-path propagation may be greater or less than the amplitude of the late reflections
depending on the distance of the source from the microphone. Strong direct-path means
the source is close to the microphones. The early reflection are often taken as the first
50 ms of the impulse response. They mainly originate from the first order reflection, have
directionality and are highly correlated with the direct signal.The remaining part is referred as
late reflections with sufficiently smaller magnitude. The late reflections can also be conceived
as spatially white noise.
Reverberation is measured using reverberation time T60 or direct to reverberant energy
ratio (DRR). DRR is defined as
� � �
nd
t=0 h2 (t)
DRR = 10 log10 �∞ 2
dB (3.27)
t=nd +1 h (t)

where samples of h(t) up to nd represents direct-path propagation, while samples with indices
greater than nd represent only the reverberation due to reflected paths. With an increase in
DRR or decrease in the reverberation time, the room impulse response h comes very close to
a delta function improving the accuracy of the DOA estimation.
3.4 Acoustic Source Localization 31

3.4 Acoustic Source Localization

In applications like speech enhancement, hands-free communication and surveillance, micro-


phone array processing is utilized. The central problem in all such applications is source
localization. In this Section, a brief background on various methods of source localization
is provided. The acoustic pressure described by Equation 3.9 is utilized herein to provide
a background to source localization using a ULA. Different methods of source localization
using ULA are discussed in the ensuing Sections.

3.4.1 Correlation-based Source Localization

In correlation-based source localization method, the time delay of arrival (TDOA) is com-
puted between a pair of microphones. The time delay corresponds to lag at which the
cross-correlation is maximum [44]. The DOA is estimated from time delay using the relation
in 3.15. A ULA consisting of I microphones with distance between two consecutive micro-
��
phones as d, is considered. Total number of microphone pairs is I2 which is the number of
combinations of I taken 2 at a time. Several variants of correlation-based source localization
are discussed herein.
10
Original Signal

−10
0 50 100 150 200 250 300 350 400 450 500
10
Delayed Signal

−10
0 50 100 150 200 250 300 350 400 450 500
Samples−−>

Figure 3.6: Voiced frame of a speech signal of length 512 samples, original signal (top) and
signal delayed by 40 samples (bottom).
3.4 Acoustic Source Localization 32

3.4.1.1 Source Localization using Plain Time Correlation

The cross-correlation between two observed signals, p1 (t) and p2 (t) is defined as

rpP1Tp2C (lg ) = E[p1 (t)p∗2 (t − lg )] (3.28)

where lg is the lag and (.)∗ denotes complex conjugate. In practice, the cross-correlation is
estimated for any two finite signals as
Ns

r̂pP1Tp2C (lg ) = p1 (t)p∗2 (t − lg ). (3.29)
t=−Ns

The cross-correlation r̂pP1Tp2C attains its maximum when lg equals the actual delay τ . The
proof can be seen in [44]. Hence, the TDOA can be estimated as

1
τ̂ P T C = argmax r̂pP1Tp2C (lg ). (3.30)
fs lg

where fs is the sampling rate. DOA is now estimated using Equation 3.15. The concept is
illustrated using Figures 3.6 and 3.7. Two observed signals with time lag 40 samples, is shown
in Figure 3.6. Their cross-correlation is plotted in Figure 3.7. The peak in cross-correlation
plot can be observed at a lag equal to 40 in Figure 3.7. The plain time correlation method is
simple to implement. However, its performance is limited by factors like signal self correlation
and reverberation.
4000
Time Correlation

X: 472
Y: 3491

2000

−2000

−4000
0 100 200 300 400 500 600 700 800 900 1000
4000
Time Correlation

X: −40
Y: 3491

2000

−2000

−4000
−500 −400 −300 −200 −100 0 100 200 300 400 500
Lag−−>

Figure 3.7: Plain time correlation plots.


3.4 Acoustic Source Localization 33

4000
X: −40
GCC

Y: 3491

2000

0
−500 −400 −300 −200 −100 0 100 200 300 400 500
GCC−Roth

2
X: −40
Y: 1

0
−500 −400 −300 −200 −100 0 100 200 300 400 500
GCC−Phat

1 X: −40
Y: 1

0.5

0
−500 −400 −300 −200 −100 0 100 200 300 400 500
Lag−−>

Figure 3.8: Generalized cross-correlation (GCC), GCC-Roth and GCC-PHAT plots (top to
bottom)

3.4.1.2 Source Localization using Generalized Cross-correlation

Generalized cross-correlation (GCC) was introduced to overcome the limitation of plain time
correlation [53]. It implements frequency domain cross-spectrum with a weighting function.
Let the discrete Fourier transform (DFT) of signal output from two microphones, be repre-
sented by p1 (k) and p2 (k). The general expression for GCC is given by

rpGCC
1 p2
(lg ) = F −1 {w(k)p1 (k)p∗2 (k)} (3.31)

where F −1 stands for inverse discrete-time Fourier transform and w(k) is weighting func-
tion. The term w(k)p1 (k)p∗2 (k) is called generalized cross-spectrum. The TDOA estimate is
obtained from the lag time that maximizes the generalized cross-correlation, as

1
τ̂ GCC = argmax rpGCC
1 p2
(lg ) (3.32)
fs lg

Now the DOA estimate is given by Equation 3.15.


For w(k) = 1, GCC degenerates to cross-correlation with implementation through DFT
and inverse DFT (IDFT).
3.4 Acoustic Source Localization 34

In GCC-Roth method, a Roth filter [54] weighs GCC by a factor of auto-correlation of


one of the signal. Hence, the Roth filter is given by

1
wROT H (k) = (3.33)
p1 (k)p∗1 (k)

For reverberant environments, phase transform (PHAT) [53] weighting function is used for
TDOA estimation using GCC. The PHAT weighting function is given by

1
wP HAT (k) = (3.34)
|p1 (k)p∗2 (k)|

The PHAT filter normalizes the amplitude of the spectral density of the two signal and utilizes
only the phase information for computing the cross-correlation. Figure 3.8 plots all variants
of GCC for the signals shown in Figure 3.6.

Figure 3.9: Beamformer block diagram

3.4.2 Beamforming-based Source Localization

Beamforming is a spatial filtering technique where signal from a given direction is passed
undistorted, while signals from all other directions are attenuated. It is equivalent to forming
a beam in the look direction which is done by weighting and summing the array outputs.
This is illustrated in Figure 3.9.
The beamformed array output is given by

po (t) = wH p(t) (3.35)


3.4 Acoustic Source Localization 35

� �T
where w = w1 w2 · · · wI is beamforming weight vector and (.)H denotes conjugate
transpose of (.). Power spectrum of the spatially filtered signal

E{|po (t)|2 } = wH Rp w, where, Rp = E[p(t)p(t)H ] (3.36)

should give peak in DOAs for sources located in the field of view of the array. This tech-
nique is used in beamforming-based source localization. Different choice of weights leads to
different beamforming techniques. Two prominent beamforming techniques are presented in
the ensuing Sections.

3.4.2.1 Delay-and-Sum Beamforming

Delay-and-sum beamformer configuration is shown in Figure 3.10. Signal incident on array,


suffers different delays at different microphones. The array output is delayed so that signal
from desired direction is aligned. The aligned signals are summed, to realize a delay-and-sum
beamformer (DSB). This is illustrated in Figure 3.10.

t2
t1
S
O t2
U
R
C
E t1

t2

t1
N
O
I
S 0
E

Figure 3.10: Delay-and-sum beamformer


3.4 Acoustic Source Localization 36

The Delay-and-sum beamformer design problem is formulated as [55]:

min wH w subject to wH a(φ, k) = 1 (3.37)


w

Solution to the optimization problem in Equation 3.37, results in DSB weights as


a(φ, k)
w= , (3.38)
I
where a(φ, k) is steering vector defined by Equation 3.16. The solution doesn’t depend upon
the input signal and only takes into consideration the steering vector of the signal of interest.
Hence, the Delay-and-sum beamformers are not adaptive. The spatial power spectrum for
DSB from Equation 3.36, can now be written as

PDSB (φ) = aH (φ)Rp a(φ). (3.39)

1
It is to be noted that the I2
term is removed from the power spectrum, which does not affect
the DOA estimation in any way. Delay-and-sum beamforming DOA estimates are given by
the location of L highest peaks corresponding to L sources in DSB spatial power spectrum.
DSB based soured localization is inconsistent when multiple sources are present. Bias of the
estimates also become significant for closely spaced and correlated sources.

3.4.2.2 Capon Beamforming

Contrary to delay-and-sum beamformers, minimum variance distortionless response (MVDR)


beamformer or Capon beamformer [2] is adaptive in the sense that it takes into account the
signal characteristics along with the steering vector of the signal of interest. The capon
spatial filter design problem is based on maximizing the signal to interference plus noise ratio
(SINR). SINR is defined as
E|wH a(φ)s(t)|2 σ 2 |wH a(φ)|2
SINR = H 2
= s H (3.40)
E|w v(t)| w Rv w
where σs2 is signal power for an individual source signal and Rv = E[v(t)vH (t)]. Maximizing
SINR results in minimizing wH Rv w or minimizing the variance of wH v. Also, distortionless
response gives wH a(φ) = 1. Hence, minimum variance distortionless response formulation of
capon beamformer is given by

min wH Rv w subject to wH a(φ) = 1 (3.41)


w
3.4 Acoustic Source Localization 37

Solution to the constrained problem in Equation 3.41 is given by

Rv −1 a(φ)
w= (3.42)
aH (φ)Rv −1 a(φ)

However, Rv −1 is not available in practice. Therefore, Rp is used in place of Rv . This results


in final form of weight vector as

Rp −1 a(φ)
w= (3.43)
aH (φ)Rp −1 a(φ)

Utilizing the expression for MVDR weights in Equation 3.36, the spatial power spectrum for
MVDR can written as
1
PM V DR = . (3.44)
aH (φ)Rp −1 a(φ)
The MVDR DOA estimates can be given as L largest peaks in the MVDR power spectrum
corresponding to L sources.
The MVDR filter steered to certain direction φ, attenuates any other signal impinging on
the array from a DOA�= φ. The DSB filter pays uniform attention to all other DOAs �= φ
also. DOA estimation using DSB and MVDR power spectrum is illustrated in Figure 3.11.
A ULA with 10 microphones was used. The sources are assumed to be at 20◦ and 60◦ .
120 1.4

100 1.2

1
80
0.8
PMVDR
PDSB

60
0.6
40
0.4
20 0.2

0 0
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180
Azimuth(φ) Azimuth(φ)

(a) (b)

Figure 3.11: DOA estimation using (a) DSB and (b) MVDR method. A ULA with I = 10
microphones was used for sources located at 20◦ and 60◦ .
3.4 Acoustic Source Localization 38

0 0
−30 30

−20
−60 60
G(φ,φs) dB

−40

0 −10 −20 −30


−90 90
−60

−80 −120 120

−100 −150 150


0 20 40 60 80 100 120 140 160 180
Azimuth(φ) 180

(a) (b)

Figure 3.12: Delay-and-sum beampattern for ULA with no spatial aliasing for I = 10, φs =
90◦ and d = 0.5λ (a) in Cartesian coordinates and (b) in polar coordinates.

3.4.2.3 Beampattern Analysis

Beampattern is defined as the magnitude of the spatial filter’s directional response. It is


also called directivity pattern, array pattern or spatial pattern. Beampattern analysis gives
an insight into the design of spatial filters. For given weight vector w of a beamformer,
beampattern specifies the response of the beamformer to a source arriving from the arbitrary
direction in the field of view of the array. Beampattern is typically measured as the array
response to a single plane wave [48]. Hence, the beamformed output can be written as

po (t) = wH p(t) = wH a(Ψ, k)s(t) + wH v(t) (3.45)

where wH a(Ψ, k) is directional response of an array.


Formulation of delay-and-sum beampattern for a ULA is now presented herein. A ULA
aperture, steered to direction φs is considered. The beampattern for such ULA can be written
as
G(φ, φs ) = |wH (φs )a(φ, k)| (3.46)

Utilizing the DSB weight from Equation 3.38, the beampattern for the ULA is given by

1 H
G(φ, φs ) = |a (φs , k)a(φ, k)| (3.47)
I
3.4 Acoustic Source Localization 39

where |(.)| is absolute value of (.). Substituting the expression for steering vector from
Equation 3.16, beampattern for a ULA can be written as
I
1 � j(i−1)kd(cos(φ)−cos(φs ))
G(φ, φs ) = e
I
i=1
� � � �� �
� sin Ikd cos(φ) − cos(φ s ) �
� 2
� � � �
=� � (3.48)
� I sin kd cos(φ) − cos(φ )� �
2 s

Beampatterns for different beamformers and for different array aperture can be formulated
in the similar lines. Narrowband beampatterns of a delay-and-sum beamformer, is illustrated
in Figure 3.12 without spatial aliasing and in Figure 3.13 under aliasing. A ULA with 10
microphones are used for this, with steering angle φs = 90◦ . Delay-and-sum beampattern
for UCA is also plotted in Figure 3.14. A 10 element UCA was used with steering direction
as Ψs = (45◦ , 90◦ ). The beampatterns for spherical microphone array will be presented in
Chapters 5 and 7, in spherical harmonics domain. Additional details on other parameters of
microphone array, can be found in [56].
0 0
−30 30
−10

−60 60
−20
G(φ,φs)

−30 0 −10 −20 −30


−90 90
−40

−50 −120 120

−60 −150 150


0 20 40 60 80 100 120 140 160 180
Azimuth(φ) 180

(a) (b)

Figure 3.13: Delay-and-sum beampattern for ULA under aliasing for I = 10, φs = 90◦ and
d = 2λ (a) in Cartesian coordinates and (b) in polar coordinates.

3.4.3 Subspace-based Source Localization

Although correlation-based and beamforming-based source localization methods are widely


used in many of the applications, they are often limited by resolution ability. These meth-
3.4 Acoustic Source Localization 40

(a) (b)

Figure 3.14: Illustration of Delay-and-sum beampattern for UCA with I = 10, Ψs =


(45◦ , 90◦ ), under no spatial aliasing (a) in spherical coordinate system (b) in rectangular
coordinate system

ods fail in multi-source environments when sources are closely spaced. The limitation arises
because these methods do not exploit the sensor array data efficiently. Schmidt proposed
MUltiple SIgnal Classification (MUSIC) algorithm [3], based on decomposition of array co-
variance matrix into noise and signal subspace. The geometrical interpretation of MUSIC
algorithm was also given in [3]. The MUSIC algorithm forms the basis of various other
subspace-based methods like root-MUSIC [20], Estimation of Signal Parameters via Rota-
tional Invariance Techniques (ESPRIT) [21], minimum-norm [57] and MUSIC-Group delay
[19]. In the following sections we present MUSIC, root-MUSIC and MUSIC-Group delay for
ULA.

3.4.3.1 The MUSIC Method

The MUSIC algorithm is high resolution source localization method, which utilizes the eigen
structure of the input covariance matrix. However, it requires very precise and accurate array
3.4 Acoustic Source Localization 41

calibration. The narrowband data model for ULA from Equation 3.9 can be re-written as
   
p1 (t) s1 (t)
   
  � �  
p2 (t)  s (t) 
  = a (φ , k) a (φ , k) · · · a (φ , k)  2  + v(t) (3.49)
 ..  1 1 2 2 L L  .. 
 .   . 
   
pI (t) sL (t)

p(t) = As(t) + v(t). (3.50)

Geometrically, received data p and steering vector al can be seen as vectors in I dimensional
space, and p is linear combination of steering vectors with sl as co-efficients.
The array covariance matrix can be written as

Rp = E[ppH ] = E[AssH AH ] + E[vvH ]

= ARs AH + σ 2 I

= Ri + σ 2 I (3.51)

where Rs is signal covariance matrix given by


 
E[|s1 |2 ] ... ... 0
 
 
 0 E[|s2 |2 ] ... 0 
H
Rs = E[ss ] =   (3.52)

 ... ... ... 0 
 
... ... ... E[|sL |2 ],

and I is identity matrix. It can be noted that Rs is an L × L diagonal matrix that has
all the eigenvalues (diagonal element) positive, making Rs to be positive definite matrix.
Steering matrix A comprises of steering vectors which are linearly independent. Hence, A
has full column rank. Full column rank of A and positive definiteness of Rs guarantees that,
when number of sources L is less than number of sensors I, the I × I matrix Ri is positive
semidefinite with rank L. It implies that I −L eigenvalues of Ri will be zero. Hence, assuming
qu to be uth eigenvector corresponding to zero eigenvalue, we have

Ri qu = ARs AH qu = 0 (3.53)

qH H
u ARs A qu = 0 (3.54)
3.4 Acoustic Source Localization 42

As Rs is positive definite matrix, the following condition holds.

AH q u = 0 (3.55)

aH
l (φl )qu = 0∀l = 1, 2, · · · , L and ∀u = 1, 2, · · · , I − L. (3.56)

Equation 3.56 implies that all the (I − L) noise eigenvectors (qu ) are orthogonal to the L
steering vectors. All such noise eigenvectors are denoted by Qn , as a I × (I − L) matrix. Qn
is called the noise subspace. Now, the MUSIC spectrum is formulated as

1 1 1
PM U SIC (φ) = �I−L = = . (3.57)
u=1 |aH (φ)qu |2 ||QH
n a(φ)||
2 (aH (φ)Q H
n Qn a(φ))

As the noise eigenvector is orthogonal to steering vector, the denominator becomes zero
for φ = DOA. Hence, the DOA is estimated from the L largest peaks in MUSIC spectrum
corresponding to L incident sources.

3.4.3.2 Computing MUSIC Spectrum from Sample Covariance Matrix

It is to be noted that in practice, array covariance matrix Rp is available for processing and
not Ri . Additionally, Rp is to be estimated as sample covariance matrix for Ns snapshots,
given by
Ns
1 �
R̂p = p(t)pH (t) (3.58)
Ns
t=1

When the data is Gaussian, the sample covariance matrix converges to true covariance matrix.
Now, Qn has to be estimated from R̂p . Let qi be any eigenvector of R̂i with eigenvalue as
λi , then

R̂i qi = λi qi

R̂p qi = R̂i qi + σ 2 Iqi

= (λi + σ 2 )qi
3.4 Acoustic Source Localization 43

It means that any eigenvector of R̂i is also an eigenvector of R̂p with eigenvalue as (λi + σ 2 ).
So, if R̂i = QΛQH then

R̂p = Q[Λ + σ 2 I]QH


 
2 ··· ···
 λ1 + σ 0 0 0 0
 
 0 λ2 + σ 2 ··· 0 0 ··· 0
 
 .. .. .. .. .. .. .. 
 . . . . . . .
 
  H
R̂p = Q 
 0 0 · · · λL + σ 2 0 ··· 0 Q (3.59)
 
 0 0 ··· 0 σ2 · · · 0
 
 . .. .. .. .. . . .. 
 .. . 
 . . . . .
 
0 0 ··· 0 0 ··· σ2

The eigenvector matrix Q is decomposed into signal subspace Qs and noise subspace Qn
as follows. The eigenvectors corresponding to the highest L eigenvalues, form the signal
subspace matrix of order I × L. The other I − L columns of Q (noise eigenvectors) form
noise subspace, Qn with eigenvalues σ 2 . It is to be noted that noise eigenvalues are negligible
when compared to signal eigenvalues. Now the MUSIC spatial spectrum can be computed
as in Equation 3.57.
Music Spectrum

0.5

0
0 10 20 30 40 50 60 70 80 90
DOA
Music Spectrum

0.5

0
0 10 20 30 40 50 60 70 80 90
DOA

Figure 3.15: MUSIC-Magnitude spectrum for DOA 60◦ and 65◦ using 5 sensors (top) and for
15 sensors (bottom).

The MUSIC spectrum is plotted for two closely spaced sources at 60◦ and 65◦ using 5 and
3.4 Acoustic Source Localization 44

15 sensors in Figure 3.15. The MUSIC spectrum is also called MUSIC-Magnitude spectrum
as it utilizes the magnitude spectrum of MUSIC, as it can be seen from Equation 3.57.

3.4.3.3 The MUSIC-Group Delay Method

MUSIC-Group delay as proposed in [19], utilizes phase spectrum of MUSIC for robust source
localization. MUSIC-Magnitude spectrum requires a large number of sensors to resolve closely
spaced sources as shown in Figure 3.15. In reverberant environments, it requires a compre-
hensive search algorithm for deciding candidate peaks for DOA due to a large number of
spurious peaks [58].
MUSIC-Group delay spectrum is defined as [19]
I−L

PM GD (φ) = ( |∇arg(aH (φ)qu )|2 )PM U SIC (φ) (3.60)
u=1

where ∇ arg indicates gradient of unwrapped phase spectrum of (aH (φ)qu ). The gradient
is with respect to the spatial variables φ. A sharp transition at the DOAs is observed in
unwrapped phase spectrum of MUSIC. Hence, gradient of the unwrapped phase spectrum
(group delay of MUSIC) results in sharp peaks at the location of the DOAs. In practice,
abrupt changes can occur in the phase due to small variations in the signal caused by micro-
phone calibration errors. This leads to spurious peaks in group delay spectrum. However,
the product of MUSIC and group delay spectra, called MUSIC-Group delay [19], removes
such spurious peaks and gives high resolution estimation.
Figure 3.16 illustrates MUSIC, unwrapped phase and MUSIC-Group delay spectra for
two sources with azimuth (60◦ , 65◦ ) and (50◦ and 60◦ ). High resolving capability of MUSIC-
Group delay can be seen using limited number of sensors. The MUSIC-Group delay spectrum
is able to preserve the peaks corresponding to DOAs due to additive property of group delay
spectra. A mathematical proof of additive property using ULA, is also given in [19]. A
detailed similarity between corresponding spectra obtained for ULA and UCA, is provided
in Chapter 4.
3.4 Acoustic Source Localization 45

1 1
MM

MM
0.5 0.5
0 0
0 10 20 30 40 DOA 50 60 70 80 90 0 10 20 30 40 DOA 50 60 70 80 90
1 1
MP

MP
0 0.5
−1 0
0 10 20 30 40 DOA 50 60 70 80 90 0 10 20 30 40 DOA 50 60 70 80 90
1 1
MGD

MGD
0.5 0.5
0 0
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 DOA 50 60 70 80 90
DOA

(a) (b)

Figure 3.16: MUSIC, Unwrapped phase (of MUSIC) and MUSIC-Group delay spectra for
two sources with azimuth (a) 60◦ and 65◦ , (b) 50◦ and 60◦ .

3.4.3.4 The MUSIC-Group Delay Method using Shrinkage Estimators

This work proposes to utilize covariance matrix estimated using shrinkage estimators in
computation of MUSIC-Group delay [59]. Shrinkage estimators are a widely used class of
estimators which regularize the covariance matrix by shrinking it toward some target structure
Rt [60]. It is formulated as linear combination of unbiased estimate (sample covariance
matrix) Ru and a biased (target) estimate Rt . Hence, the covariance matrix is now estimated
using shrinkage estimator as
R̂p = βRt + (1 − β)Ru (3.61)

where β ∈ [0, 1], denotes the shrinkage intensity. The value of β is chosen so that the average
likelihood of omitted samples is maximized as suggested in [61].
Subspace based methods are prone to errors at low signal to noise ratio (SNR) and
high reverberation. Under such condition, the noise eigenvalues are no more negligible and
may become comparable to signal eigenvalues, leading to erroneous result [62]. However,
the shrinkage estimation method of estimating correlation matrix, indeed suppress the noise
eigenvalues as can be seen from the Figure 3.17.
The high resolution of the MUSIC-Group delay with shrinkage estimator (SMGD) as
compared to the MUSIC-Group delay (MGD) and MUSIC-Magnitude computed from the
3.4 Acoustic Source Localization 46

2
10
Eigenvalues from sample covariance
Eigenvalues using Shrinkage estimation

0
10
Eigenvalues

−2
10

−4
10
1 2 3 4 5 6 7 8 9 10
Index

Figure 3.17: Eigenvalue estimation using sample covariance and shrinkage estimator using
10 Sensors for 3 Sources, located at 20◦ , 35◦ and 50◦ .

sample covariance matrix, can be seen in Figure 3.18.

3.4.3.5 The root-MUSIC Method

The subspace-based methods described till now, have significant limitation. The accuracy is
limited by the discretization at which the spectrum (PM V DR (φ), PM U SIC (φ), or PM GD (φ))
is estimated. Moreover, it requires a comprehensive search algorithm for deciding candidate
peak corresponding to DOA of a source. root-MUSIC proposed in [20], is a search free
algorithm, and it estimates DOAs as roots of MUSIC polynomial. Hence, the solution is
exact and not limited by the discretization.
The MUSIC spectrum as in Equation 3.57, can also be written as

−1 H H
PM U SIC (φ) = a (φ)Qn Qn a(φ)

= aH (φ)Ca(φ) (3.62)

where, C = Qn QH
n

Substituting z = ejkd cos(φ) in Equation 3.16, steering vector for ULA can be expressed as
� �T
a(φ) = 1 z z 2 · · · z I−1 . (3.63)

Utilizing Equation 3.63 in Equation 3.62, the MUSIC spectrum can now be written in a
3.4 Acoustic Source Localization 47

1
MUSIC

0.5

0
0 10 20 30 40 50 60
1
MGD

0.5

0
0 10 20 30 40 50 60
1
SMGD

0.5

0
0 10 20 30 40 50 60
Azimuth (φ)

Figure 3.18: The MUSIC-Magnitude spectrum (Top), the MUSIC-GD spectrum (Middle),
and the MUSIC-GD spectrum with shrinkage estimation (bottom) using 6 sensors for closely
spaced sources located at 20◦ and 25◦ , at DRR=20dB.

polynomial form (called root-MUSIC polynomial), as shown below.


I−1 �
� I−1
−1
PM U SIC (z) = z n Cmn z −m (3.64)
m=0 n=0
I−1 �
� I−1
P (z) = z n−m Cmn (3.65)
m=0 n=0

The double summation in Equation 3.65, can be written as single summation by Substituting
n − m = r which suggests
(I−1)

P (z) = Cr z r (3.66)
r=−(I−1)

where, Cr = Cmn .
n−m=r

It can be observed that the root-MUSIC polynomial is of degree (2I − 2) with (2I − 2) roots.
1
Additionally, if z is the root of the polynomial, z∗ is also the root. This can be seen from the
1
notational definition of z. Since, z and z∗ have the same phase and reciprocal magnitude,
one root is within the unit circle while the other is outside. Hence, (I − 1) roots are within
the unit circle and rest (I − 1) roots are outside. As DOA information is present in phase
3.5 Wideband Source Localization 48

Imaginary Part
0

−1

−2
−2 −1 0 1 2
Real Part

Figure 3.19: Z-Plane representation of all the roots of root-MUSIC polynomial using 8 sensors
for 2 sources with locations 40◦ and 50◦ .

which is same for both the set, any one set of roots can be utilized for DOA estimation. Also,
without noise, all roots should fall on unit circle. However, because of noise, the roots move
away from the unit circle. Hence, out of (I − 1) roots within the unit circle, L roots close to
unit circle can be used for DOA estimation. The azimuth is estimated using
�ln(z)
φ = cos−1 { } (3.67)
kd
where � is imaginary part. Figure 3.19 plots the roots of root-MUSIC polynomial using 8
sensors for two sources at 40◦ and 50◦ .

3.5 Wideband Source Localization

The algorithms discussed in the previous Section, are limited to narrowband source localiza-
tion only. The methods can not be directly applied to speech signal which is wideband in
nature. For narrowband sources, time delay directly translates to a phase shift in the fre-
quency domain and, the phase shift is approximately constant over the signal bandwidth. As
can be seen from Equation 3.68, the phase shift is function of delay only which is function of
array structure and the source location. Hence, DOA estimation can be performed utilizing
classical narrowband source localization algorithms.

s(t − τ ) ↔ S(f )e−j2πf τ ≈ S(f )e−j2πfc τ ↔ s(t)e−j2πfc τ (3.68)


3.5 Wideband Source Localization 49

When the signal is wideband, the phase shift is function of frequency also, along with
source location and array geometry. The phase shift is no more constant over the frequency of
interest. Also, the number of significant eigenvalues for the array covariance matrix become
larger than the number of sources L. This is due to mixing of different frequency components
[63]. Hence, as the bandwidth of the source increases, decomposing the covariance matrix
into signal subspace and noise subspace becomes difficult.
In order to deal with localization of wideband sources, the array output is decomposed
into multiple narrowband frequency components using fast Fourier transform (FFT). The
output at each array is segmented into Ns snapshots. The temporal FFT is applied to
each snapshot to determine K frequency components. Representing the array output for the
tth snapshot (where t = 1, 2, · · · Ns ) and the κth frequency component as Pt,κ , the sample
covariance matrix is computed as
Ns −1
1 � H
R̂P κ = Pt,κ Pt,κ (3.69)
Ns
t=0

for κ = 0, 1, · · · , K − 1. Based on the way the information in covariance matrix is utilized,


the wideband source localization is divided into two categories, incoherent and coherent.
Incoherent methods for wideband source localization process each frequency bin indepen-
dently. Narrowband source localization is applied over each frequency bin. An average DOA
estimate is found over all frequency bins. The incoherent MUSIC-Magnitude spectrum can
thus be written as

1
PM U SIC (φ) = �K−1 (3.70)
κ=0 (aH (f H
κ , φ)Qn Qn a(fκ , φ))

Such incoherent combination of information in different frequency bins, gives accurate


DOA estimates at high SNR for well separated sources. The coherent approach to wideband
source localization is presented in [64].
3.6 Summary 50

3.6 Summary

The array data model used in source localization is discussed in this chapter. Various
approaches to source localization over uniform linear array are also discussed. Among
correlation-based, beamforming-based, and subspace-based methods, subspace-based meth-
ods exhibit high resolution. A robust method for source localization based on phase infor-
mation of MUSIC, called MUSIC-Group delay, is also described. The MUSIC-Group delay
method using shrinkage estimators, is introduced for robust source localization over a uni-
form linear array. Effects of noise and reverberation are discussed in the context of source
localization. Methods for wideband source localization are also briefly discussed.
Chapter 4

MUSIC-Group Delay Method for


Source Localization over Planar
Microphone Array

4.1 Introduction

Planar microphone array can localize sources anywhere in azimuthal plane, with elevation
in the range of 0 to 90◦ . Also, they are more compact when compared to linear arrays, for
the same number of microphones. Hence, various planar arrays has been used for source
localization which includes rectangular, circular and V-shaped [65, 66, 67, 68].
As discussed in the previous Chapter, correlation based and beamforming based source
localization methods provide inconsistent results when multiple sources are present. The
bias of the estimates may become significant when the sources are closely spaced, correlated
and in reverberant environments. In subspace based methods, MUltiple SIgnal Classification
(MUSIC) is widely studied due to its computational efficiency. However, it requires a large
number of sensors to resolve closely spaced sources. In reverberant environments, it requires
a comprehensive search algorithm for deciding candidate peaks for direction of arrival (DOA)
due to a large number of spurious peaks [58].
MUSIC algorithm for source localization using uniform circular array (UCA) can be
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 52

found in [48]. UCA-RB (Real-Beamspace) MUSIC is proposed in [47], utilizes phase mode
excitation based transformation. Conventionally, spectral magnitude of MUSIC is utilized
for computing the DOAs of multiple sources incident on the array of sensors. The phase
information of the MUSIC spectrum has been studied in [18] for DOA estimation over a
uniform linear array (ULA).
In this Chapter, the negative differential of the unwrapped phase spectrum (group delay)
of MUSIC is proposed for DOA estimation over planar arrays. Although the group delay
function has been used widely in temporal frequency processing for its high resolution prop-
erties [4], the additive property of the group delay function has hitherto not been utilized in
spatial spectrum analysis. In the following section, MUSIC-Group delay (MGD) spectrum is
discussed for robust source localization using uniform circular array.

4.2 The MUSIC-Group Delay Method for Robust Multi-source


Localization

Subspace-based methods for DOA estimation based on the spectral magnitude of MUSIC
require a large number of sensors for resolving spatially close sources and are prone to errors
under reverberant conditions. In [19], a novel method for high resolution source localization
based on the MUSIC-Group delay spectrum over ULA has been proposed. This method is
able to resolve closely spaced sources with limited number of sensors. In the following Section,
MUSIC-Group delay based method for two dimensional source localization over planar arrays
is proposed.

4.2.1 Music-Group Delay Method for Source Localization over Planar Ar-
ray

As shown in Section 3.3, the received pressure over planar array with I microphones from L
(L < I) narrowband sources can be written as

p(t) = A(Ψ, k)s(t) + v(t) , t = 1, 2, · · · , Ns (4.1)


4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 53

where Ψ = (θ, φ) is angular location of a source with θ being the elevation and φ being the
azimuth, as defined in Section 2.2. A(Ψ, k) is I × L steering matrix, expressed as
� �
A(Ψ, k) = a(Ψ1 , k) a(Ψ2 , k) . . . a(ΨL , k) . (4.2)

s(t) = [s1 (t), s2 (t), · · · , sL (t)]T is vector of signal amplitudes at the reference point and (.)T
denotes the transpose of (.). A particular steering vector a(Ψ, k) consisting of time delays,
can be expressed as
� �T
a(Ψ, k) = e−jωc τ1 (Ψ) e−jωc τ2 (Ψ) · · · e−jωc τI (Ψ) (4.3)

where τi is time delay at the ith microphone with respect to the reference microphone and
ωc is narrowband signal frequency. The noise v is assumed to be stationary, zero mean,
uncorrelated random process. From equation 3.18, the delays τi (Ψ) is related to azimuth and
elevation angles as
−ra sin θl cos (φl − φi )
τi (Ψl ) = (4.4)
c
where ra is radius of the circular array, φi is azimuth angle of the ith microphone with the
center of the circular array as the reference and c is speed of sound.

4000

2000
MUSIC Magnitude

0
100
50
Ele(θ) 0 0 20 40 60 80 100 120 140 160 180
1

0.5

0
0 10 20 30 40 50 60 70 80 90
Azi(φ)

Figure 4.1: Spectral magnitude of MUSIC for UCA (top) and ULA (bottom). Sources at
(15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA.
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 54

The MUSIC-Magnitude (MM) spectrum for planar array is given by


1 1 1
PM U SIC (Ψ) = = = I−L (4.5)
aH (Ψ)Qn [Qn ]H a(Ψ) ||aH (Ψ)Qn ||2 �
|aH (Ψ)qu |2
u=1

where Qn is noise subspace obtained from eigenvalue decomposition of autocorrelation matrix,


Rp = E[p(t)p(t)H ] and qu ∈ Qn , is the uth noise eigenvector. The denominator takes a
null value when Ψ corresponds to signal direction. Hence, the MUSIC-Magnitude spectrum
PM U SIC (Ψ) has a peak at the DOA represented by the azimuth and elevation angle (θ, φ).
However, when the sources are closely spaced, MUSIC-Magnitude spectrum is unable to
resolve them clearly, giving many spurious peaks or single peak when limited number of
sensors are used. This is illustrated in Figure 4.1 and Figure 4.4(a) respectively.
The experimental setup for Figures 4.1-4.3 utilizes a UCA of twelve sensors, placed on two
concentric circles. Four sensors are placed on the inner circle and eight sensors on the outer
circle. The sources are placed at (15◦ ,50◦ ) and (20◦ ,60◦ ). Additionally, a corresponding figure
utilizing a ULA is also illustrated. The ULA consisting of eight sensors, is used to illustrate
azimuth only of the sources.
5
0
MP

−5
100

80

60

Ele(θ) 40

20

120 140 160 180


0 60 80 100
0 20 40
1
MP

0.5
0
0 10 20 30 40 50 60 70 80 90
Azi(φ)

Figure 4.2: Spectral phase of MUSIC for UCA (top) and ULA (bottom). Sources at (15◦ ,50◦ )
and (20◦ ,60◦ ) for UCA. Sources at 50◦ and 60◦ for ULA.

To overcome the limitation of MUSIC, the group delay function of MUSIC spectrum is
presented herein for resolving closely spaced sources with limited number of sensors. The
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 55

proposed MUSIC-Group delay spectrum for two dimensional DOA (azimuth and elevation)
estimation over planar arrays is defined as,

� I−L
� �
PM GD (Ψ) = |∇ arg(aH (Ψ)qu )|2 PM U SIC (Ψ) (4.6)
u=1

where ∇ arg indicates gradient of unwrapped phase spectrum of (aH (Ψ)qu ). The gradient is
with respect to the spatial variables θ and φ.
Phase spectra of MUSIC for UCA and ULA are shown in Figure 4.2. It can be noted from
the figure that in the neighborhood of the DOA, there is a sharp change in the unwrapped
phase spectrum for both UCA and ULA. Differentiating this unwrapped phase spectrum
results in very sharp peaks at the location of the DOAs. In practice, abrupt changes in phase
can also occur due to microphone calibration errors. Hence, differential phase can result in
sharp peak at an angle even if it is not a DOA. This differential phase (group delay) spectrum
is illustrated in the Figure 4.3(a) for UCA (top) and for ULA (bottom). MUSIC-Group delay
spectrum being product of the MUSIC-Magnitude and the group delay spectra, is able to
remove the spurious peaks and retains only the peaks corresponding to DOAs, as illustrated
in Figure 4.3(b).
4
x 10
30
4
MUSIC−Group Delay

20
2
Standard Group delay

10

0 0
100 50
Ele(θ) 0 0 20 40 60 80 100 120 140 160 180 100
50
1 Ele(θ) 0 0 20 40 60 80 100 120 140 160 180
1
Azimuth(θ)
0.5
0.5
0 0
0 10 20 30 40 50 60 70 80 90
0 10 20 30 40 50 60 70 80 90
Azi(φ) Azi(φ)

(a) (b)
Figure 4.3: Illustration of standard group delay of MUSIC and the MUSIC-Group delay as
proposed in this work. (a) Standard group delay spectrum of MUSIC for UCA (top) and ULA
(bottom) (b) MUSIC-Group delay spectrum for UCA (top) and ULA (bottom). Sources are
at (15◦ ,50◦ ) and (20◦ ,60◦ ) for UCA, at 50◦ and 60◦ for ULA.
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 56

4.2.2 Spectral Analysis of the MUSIC-Group Delay Function under Re-


verberant Conditions

In this section, performance of MUSIC and MUSIC-Group delay in reverberant environ-


ments is presented. Performance of the subspace-based methods degrades due to multi-path
effects. In subspace-based methods like MUSIC, the signal eigenvalues of the received sig-
nal correlation matrix are significant, compared to the noise eigenvalues. However, because
of multi-path effects under reverberation, extraneous eigenvalues become significant. This
affects the performance of the subspace-based method, especially MUSIC [69]. A detailed
discussion on reverberation is presented in Section 3.3.2.
MUSIC-Magnitude spectrum and MUSIC-Group delay spectrum plots are shown in Fig-
ure 4.4 for two sources at (15◦ ,100◦ ) and (17◦ ,105◦ ) at reverberation time T60 , 400 ms. The
room impulse response (RIR) is simulated by image method [70], as implemented in [71]. It
can be seen that MUSIC-Group delay spectrum is able to resolve the sources, where MUSIC-
Magnitude spectrum gives single peak. In the following Section, the resolving power of the
MUSIC-Group delay spectrum for azimuth and elevation estimation is justified by proving
2-D additive property of group delay spectrum.

5 5
x 10 x 10
3 10
MUSIC Magnitude

MUSIC−GD

2
5
1

0 0
100 100
50 50
150 200
Elevation(θ) 100 150 200 Elevation(θ) 0 0 50 100
0 0 50
Azimuth(φ)
Azimuth(φ)

(a) (b)

Figure 4.4: Plots illustrating azimuth and elevation angle as estimated by (a) MUSIC-
Magnitude and (b) MUSIC-Group delay spectrum for sources at (15◦ ,100◦ ) and (17◦ ,105◦ ),
reverberation time 400 ms. MM estimates single peak at (18◦ ,105◦ ). MGD estimates two
peak at (19◦ ,100◦ ) and (17◦ ,108◦ ).
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 57

4.2.3 Two-dimensional Additive Property of the MUSIC-Group Delay Spec-


trum

The high resolution of the proposed MUSIC-Group delay is due to the additive property of
MUSIC-Group delay spectrum. For closely spaced sources under reverberation, the peaks
corresponding to the DOAs, merge together giving single peak in the MUSIC spectrum.
However, as described in Section 4.2.1, it is to be noted that closely spaced sources can
be resolved by MUSIC-Group delay spectrum using limited number of sensors. This high
resolution property of MUSIC-Group delay spectrum is due to its additive property, since a
product in MUSIC-Magnitude domain is equivalent to an addition in MUSIC-Group delay
domain [19]. The mathematical proof for additive property of MUSIC-Group delay spectrum
for ULA has already been dealt with in [19]. In case of ULA, the steering vector exhibits
Vandermonde structure, and hence root-MUSIC polynomial approach is used for showing the
additive property. This is not the case for UCA as it is clear from Equations 4.3,4.4.
The UCA can be divided into number of cross sections, where each cross section represents
a ULA. A single ULA will be able to estimate only the azimuth angle of arrival. Therefore, in
general two ULAs are sufficient to get an estimate of both the azimuth and elevation angles.
Having more than two ULAs improves the robustness of the estimates. For multiple incident
signals, pairing of the corresponding estimates from various ULAs, can be carried out as in
[72]. Other pairing methods for eigenvalue association can be found in [73]. Generalizing the
pairing methods of eigenvalue association for a UCA [72, 73], the steering vector a(Ψ) can
be expressed as a vector of exponentials
� (1) (1) (1) (2) (2)
�T
e−jωc τ1 e−jωc τ2 .. e−jωc τn1 e−jωc τ1 .. e−jωc τn2 .. (4.7)


where nr is the number of sensors in the rth cross section of the UCA and ∀r nr = I. Note
(r)
that τi is the delay at the ith microphone in the rth cross section of the UCA. The steering
vector can now be expressed to have Vandermonde structure as follows
� �T
a(Ψ) = z z 2 .. z n1 y y 2 .. y n2 .. (4.8)

where
(1) (2)
z = e−jωc τ1 ; y = e−jωc τ1 . (4.9)
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 58

From Equation 4.5, constructing the root-MUSIC polynomial for UCA, we have
I−L

PP OLY (Ψ) = |aH (Ψ)qu |2 . (4.10)
u=1

Utilizing Equation 4.8 and re-writing the root-MUSIC polynomial as sum of polynomials in
z and y denoted by F (z) and G(y) respectively, we have

PP OLY (Ψ) = F (z) + G(y) + · · · (4.11)

For actual DOA, Ψl , the polynomial PP OLY (Ψl ) and hence each polynomial corresponding
to a cross section of the UCA (e.g. F (z)), will become zero.
It is to be noted that F (z) is a polynomial in z having (n1 − 1) roots. Among the (n1 − 1)
roots of this polynomial, there can be maximum of L roots corresponding to L sources. It
is also possible for two or more different incident signals to lie on the cone of confusion of
a particular ULA, in which case there will be more than (n1 − 1 − L) roots lying very close
to the origin of the Z-plane. In either case, (n1 − 1 − L) roots with magnitude close to zero
can be ignored. Constructing a polynomial Y (z) from L roots corresponding to L sources,
we have
L
� L

Y (z) = (1 − z.zl−1 ) = 1 + bl .z −l (4.12)
l=1 l=1

where zl is the lth root of F (z). It is assumed herein for mathematical simplicity that all
sources fall in the field of view of the first cross-section. Without loss of generality and to
maintain consistency with the definition of the MUSIC method, one can invert Y (z) and
express it as a combined resonator H(z), where

1 1
H(z) = L
= L
. (4.13)
� �
1+ bl .z −l
(1 − z.zl−1 )
l=1 l=1

This complies with the approach wherein a DOA is looked at as a pole rather than a zero.
As we are interested in group delay spectrum of the combined resonator, H(z) can also be
re-written as product of poles as shown below
L
� L
� L

H(α) = rl ejγl (α) = [ rl ].exp(j γl (α)) (4.14)
l=1 l=1 l=1
4.2 The MUSIC-Group Delay Method for Robust Multi-source Localization 59

where rl is the magnitude and γl is the phase of the resonator pole zl . As per definition in
Equation 4.9, γ should be a function of α, the spatial variable. It may be noted from Equation
4.14 that the combined resonator exhibits a product of magnitude spectra of individual
resonators. On the other hand, it exhibits a sum of phase spectra of individual resonators.
Taking negative derivative of the unwrapped phase spectrum of the combined resonator, we
finally have

τH (α) = − arg[H(α)] = τH1 (α) + τH2 (α) + . . . + τHL (α). (4.15)
∂α
It is clear from the Equations 4.14 and 4.15 that the MUSIC-Magnitude is a product
spectrum, while the MUSIC-Group delay spectrum exhibits additive property. Due to this
additive property, the peaks are preserved in MUSIC-Group delay spectrum even for closely
spaced sources. On the other hand, MUSIC-Magnitude spectrum fails to do so. This is
illustrated in Figure 4.5.

Figure 4.5: Two dimensional spectral plots for the cascade of two individual DOAs (res-
onators), (a) Source with DOA (15◦ ,60◦ ) (b) Source with DOA (18◦ ,55◦ ) (c) MUSIC-
Magnitude spectrum (d) MUSIC-Group delay spectrum.
4.3 Localization Error Analysis 60

Two individual resonators at DOAs (15◦ ,60◦ ) and (18◦ ,55◦ ) are considered as shown in
Figures 4.5(a) and 4.5(b) respectively. The MUSIC-Magnitude and the MUSIC-Group delay
spectra for the cascade of these two resonators are plotted. It can be noted from Figure
4.5(c) that the magnitude spectrum is unable to resolve the two sources, as the two peaks
are merged due to multiplicative property of magnitude spectrum. On the contrary, the
MUSIC-Group delay spectrum is able to resolve the two sources owing to its 2-D additive
property, as can be seen in Figure 4.5(d).

4.3 Localization Error Analysis

Subspace-based methods like MUSIC and MUSIC-Group delay are sensitive to finite sample
effects, imprecisely known noise covariance, a perturbed array manifold and reverberation.
Finite sample effects occur since it is not possible to obtain a perfect covariance matrix
R of the received data over an array. In practice, estimation of the sample covariance R̂
requires averaging over several snapshots of the received data. The finite sample effects can
be neglected by taking high SNR or large number of snapshots. The error due to imprecisely
known noise covariance is also neglected to analyze the effect of sensor position error and
reverberation on the proposed method. In the ensuing Section, performance of MUSIC and
MUSIC-Group delay is presented under sensor perturbation errors. Performance evaluation
is also conducted in a reverberant environment. A numerical analysis is presented comparing
root mean square error (RMSE) of various methods under reverberation with the Cramér-Rao
bound (CRB).

4.3.1 Performance under Sensor Perturbation Error

Let ri be the nominal sensor position for the ith sensor. The position matrix r is formed from
the nominal sensor positions as
� �
R = r1 r 2 . . . rI .

Re-writing the steering vector expression from Equation 3.11, we have


� �T
a(Ψl , k) = e−jkTl r1 , e−jkTl r2 , . . . , e−jkTl rI . (4.16)
4.3 Localization Error Analysis 61

The displacements of the ith sensor from the nominal sensor positions is given as
 
1 0
µi ∼ N (0, σ 2  )
0 1
These position perturbations are assumed to be i.i.d. Gaussian random variables and are
independent of the signals or any additive noises that may occur at the sensor outputs. In
any DOA estimation process, the sensor perturbations are assumed to be time-invariant i.e.
the same perturbation is used for t = 1, 2, ..N s. The position error matrix µ is formed similar
to the position matrix R as
� �
µ = µ1 µ2 . . . µI .

Hence, perturbed sensor positions are given by R̃ = R + µ. The lth steering vector associated
with the sensor perturbation can now be written as [74]

ã(Ψl , k) = Γl a(Ψl , k), where


 T

e−jkl µ1 0 ... 0
 
 . .. 
e−jkl µ2 . .
T
 0 . 

Γl =  .
.. . .. . .. 
 . 0 
 

0 ... 0 e−jkl I

Under sensor perturbation error, the signal model in Equation 4.1 turns out to be

p̃(t) = Ã(Ψ, k)s(t) + v(t). (4.17)

The perturbed array manifold, Ã(Ψ, k), is given by


� �
Ã(Ψ, k) = ã(Ψ1 , k), ã(Ψ2 , k), . . . , ã(ΨL , k) . (4.18)

The effect of the sensor perturbation on the array autocorrelation matrix is simulated as
described in [74], and the analysis is carried out. The resolution of the MUSIC-Magnitude
and MUSIC-Group delay methods under perturbation errors is illustrated in Figure 4.6(a),
and Figure 4.6(b) respectively. The figure illustrates contour plots for the respective spectra.
Note that the spectrum for MUSIC-Magnitude shows a single peak with contours around it,
while the spectrum for MUSIC-Group delay shows two distinct peaks with different contours.
4.3 Localization Error Analysis 62

35 Source 1 :
35 Source 1 :
(θ,φ)=(20,45) (θ,φ)=(20,45)
30 Source 2 : 30 Source 2 :
(θ,φ)=(15,50) (θ,φ)=(15,50)

Elevation(θ)
Elevation(θ)

25 25
20 20

15 15

10 10

5 5
35 40 45 50 55 60 35 40 45 50 55 60
Azimuth(φ) Azimuth(φ)
(a) (b)

Figure 4.6: Contour plots of (a) MUSIC-Magnitude spectrum (b) MUSIC-Group delay spec-
trum, under sensor perturbation errors.

4.3.2 Cramér-Rao Bound Analysis

Cramér-Rao bound provides a lower bound for the mean square error (MSE) of an unknown
parameter. Average RMSE in DOA estimates has been compared with CRB for various
methods. Circular array geometry being uncoupled, the statistical coupling effect between
azimuth and elevation estimate is ignored. The Cramér-Rao inequality for estimating pa-
rameter αr is given as
var(α̂r ) ≥ [F −1 ]rr (4.19)

where the rsth element of the Fisher information matrix F is given by [75, 68]

∂Rp −1 ∂Rp
Frs = Ns tr{R−1
p R }. (4.20)
∂αr p ∂αs
4.3 Localization Error Analysis 63

For 2-D DOA estimation, the unknown parameter vector is α = [θ, φ]. The elements of Fisher
information matrix is given by

Fθθ = 2Ns Re[(Rs AH R−1 H ⊥ −1 T


p ARs ) × (Aθ PA Rp Aθ ) ]

Fθφ = 2Ns Re[(Rs AH R−1 H ⊥ −1 T


p ARs ) × (Aθ PA Rp Aφ ) ]

Fθφ = Fφθ , where

P⊥ H −1 H
A = I − A(A A) A
L
� ∂A
Aθ =
∂θl
l=1

where Rp is received correlation matrix, Rs is signal correlation matrix and A is steering


matrix, as defined in Section 4.2.1.
The azimuth and elevation angle are varied from 10◦ -150◦ and 10◦ -80◦ respectively at
reverberation time, T60 = 200 ms and SNR =10 dB. The DOA estimation is done using
MVDR and Beamspace MUSIC (BSM) [76, 47], apart from MUSIC-Magnitude and MUSIC-
Group delay. For this simulation, 15 channel UCA with r = λ (the wavelength), is considered.
The maximum phase mode excited for BSM is taken to be 7. Two closely spaced, uncorrelated
sources, with 2◦ separation in azimuth and elevation, are taken in this analysis. Average
RMSE for azimuth and elevation estimates obtained by the four methods are compared with
average Cramér-Rao bound in Table 4.1. It can be seen that average RMSE for MUSIC-
Group delay is the lowest.

Table 4.1: Comparison of average RMSE of various methods with the CRB (illustrated in
the first row) for an azimuth range of 10◦ -150◦ and elevation range of 10◦ -80◦ at T 60 of 200
ms and SNR 10dB.
Cramér-Rao bound Ele : 1.482×10−6 Azi : 1.4977×10−6
Azi 5.2723 Azi 12.3479
MGD BSM
Ele 5.2329 Ele 9.0577
Azi 5.2985 Azi 12.2562
MM MVDR
Ele 5.3316 Ele 11.1128
4.3 Localization Error Analysis 64

4.3.3 Source Localization Error Analysis under Reverberant Environments

Source localization under reverberant environment is challenging, especially for subspace-


based methods. In this Section, performance evaluation of the proposed method is conducted
in indoor environment. It is well known that effect of reverberation is prominent in such
environments. Hence, we consider a small meeting room setup shown in the Figure 4.8.
The set-up has four participants around the table. The error analysis in DOA estimation is
presented herein by scatter plot. The reverberation is simulated as discussed in Section 4.4.1.
The Noise is generated using a zero mean and unit variance Gaussian distribution.
The experiment is conducted under reverberation, with T60 of 150 ms which typically
corresponds to small meeting room. DOA estimation trials are conducted for two closely
spaced sources at (10◦ ,20◦ ) and (5◦ ,10◦ ). The SNR considered is 40 dB, to analyze the
effect of reverberation. For 500 number of independent trials, the azimuth and the elevation
estimates are plotted in the Figure 4.7. In case of MUSIC-Magnitude, there were several
cases where the estimates overlapped each other, leading to poor localization of the sources.
Also, the estimates are unevenly distributed around the actual, as illustrated in Figure 4.7(a).
Figure 4.7(b), shows the distribution of the estimates of the proposed method. It can be seen
that the average estimate will be closer to the actual in case of the proposed method.
20 20

15 15
Elevation(θ)
Elevation(θ)

10 10

5 5

0 0
0 10 20 30 0 10 20 30
Azimuth(φ) Azimuth(φ)
(a) (b)
Figure 4.7: Two dimensional scatter plot for localization for the sources at (10◦ ,20◦ ) and
(5◦ ,10◦ ) using (a) MUSIC-Magnitude method and (b) MUSIC-Group delay method. Rever-
beration time is 150 ms. SNR is 40 dB. Number of iteration is 500. The red dot indicates
the actual DOA.
4.4 Performance Evaluation 65

4.4 Performance Evaluation

The performance of the proposed method is evaluated by conducting experiments on speech


enhancement, perceptual evaluation and distant speech recognition. In the following section,
the experiment on speech enhancement is presented as improvement in signal to interfer-
ence ratio (SIR) [77]. Experiments on perceptual evaluation are also conducted for various
methods and quantified using objective measures. Distant speech recognition experiment re-
sults are presented as word error rate (WER). The proposed method, MUSIC-Group delay is
compared with MUSIC-Magnitude (MM), Beamspace MUSIC (BSM) [76, 47], linearly con-
strained minimum variance (LCMV) and minimum variance distortionless response (MVDR).

4.4.1 Experimental Conditions

The proposed algorithm was tested in a typical meeting room environments. A room with
dimensions, 730 cm × 620 cm × 340 cm was used in the experiments. The experimental
setup consists of a uniform circular, 15 channel microphone array with a radius of 10 cm.
It has one desired speaker, one competing speaker and two interfering sources as shown in
Figure 4.8.

Figure 4.8: Experimental Setup in meeting room with two speakers (S1 and S2) and two
interference (stationary noise source SN and nonstationary noise source NS). Sources are
located at (17◦ ,35◦ ), (19◦ ,40◦ ), (15◦ ,30◦ ) and (21◦ ,45◦ ) respectively. Radius of the circular
array is 10 cm.
4.4 Performance Evaluation 66

White noise and babble noise from NOISEX-92 [78] database were used as stationary
and nonstationary interfering sources respectively. The signals are acquired over the array of
microphones. Under reverberation, the signal is convolved with room impulse response.
In real life experimental conditions, a room impulse response is generated in two ways. A
microphone is used to record a short sounding pulse, giving room impulse response. Another
way involves the use of the maximum length sequence (MLS). RIR is simulated using image
method [70] as implemented in [71].
DOAs are estimated using various algorithms over the acquired signals. A filter sum
beamformer (FSB) is trained using the DOA estimates obtained. The signals are recon-
structed using the beamformer. Distant speech recognition (DSR) and speech enhancement
experiments are conducted on the reconstructed speech signal. The complete procedure is
depicted in Figure 4.9.
Z

Estimate Compute Train DSR/SIR


DOA TDOA FSB Expt.

X Y

Figure 4.9: Flow diagram illustrating the methodology followed in performance evaluation
for distant speech signal acquired over circular array.

4.4.2 Experiments on Speech Enhancement in Multi-source Environment

The performance of the proposed method is presented herein as improvement in SIR. The
input SIR of the lth speaker relative to the stationary (sn) or nonstationary (ns) interfering
4.4 Performance Evaluation 67

Table 4.2: Enhancement in SIR (dB), compared for various methods at different reverberation
time. S1s is the desired speaker, S2s is the competing speaker, S ns is non-stationary noise source
and S sn is stationary noise source.
Output SIR Output SIR Output SIR
Input SIR
(150ms) (200ms) (250ms)
Methods Source S sn S ns S sn S ns S sn S ns S sn S ns

S1s 10 5 45.6981 36.0852 40.6738 34.4724 40.2198 33.7506


MGD
S2s 10 5 46.0928 43.001 41.8349 35.358 40.3478 21.8217
S1s 10 5 42.5781 31.2571 36.575 30.5659 35.0554 30.2473
MM
S2s 10 5 45.5462 29.2422 42.003 25.3432 38.7453 21.7951
S1s 10 5 39.332 27.2701 38.9946 25.8978 38.821 24.5881
BSM
S2s 10 5 39.8571 28.7702 38.2909 27.0717 37.9901 25.6131
S1s 10 5 33.0964 27.365 30.8716 25.2634 30.1894 23.0318
LCMV
S2s 10 5 34.0859 26.7355 32.0898 25.1449 28.0 23.6274
S1s 10 5 34.7763 23.0224 26.2894 22.61 25.0518 22.1845
MVDR
S2s 10 5 33.0054 24.6961 31.058 23.5055 27.7594 19.3616

source at microphone m0 , is defined as


� �NDF T −1
x ν ξ=0 (ssl (ν, ξ)hslm0 (ν, ξ))2
SIRin,l [dB] = 10log10 � �N (4.21)
DF T −1
ν ξ=0 (sx (ν, ξ)hxm0 (ν, ξ))2

l ∈ {1, 2}

x ∈ {ns, sn}

where, ssl (ν, ξ) is lth speech signal in short time Fourier transform (STFT) domain with
a rectangular window of length NDF T , hslm0 is impulse response for lth speaker and m0
microphone pair, ν is the frame number and ξ is the frequency index.
4.4 Performance Evaluation 68

The output SIR is defined in similar fashion as,


� �NDF T −1
x ν ξ=0 (yls (ν, ξ))2
SIRout,l [dB] = 10log10 � �N (4.22)
DF T −1
ν ξ=0 (y x (ν, ξ))2

l ∈ {1, 2}

x ∈ {ns, sn}

where, y is the reconstructed or beamformed signal. The beamformer to reconstruct the


signal herein was LCMV. The result on SIR improvement is presented in Table 4.2. It can
be seen that the proposed method performs better than all the conventional methods.

Table 4.3: Comparison of perceptual evaluation results using various methods. The results
are compared based on objective measure.
Method T60 LLR SegSNR WSS PESQ

150 1.3879 -3.1843 35.5345 2.2819


MGD
250 1.6193 -4.8298 36.7986 2.2229
150 1.62 -3.1847 35.6 2.2815
MM
250 1.6487 -4.9995 37.73 2.2215
150 1.6657 -3.2 35.5639 2.28
BSM
250 1.6878 -5.108 38.5765 2.2
150 1.668 -3.22 36.2 2.2826
LCMV
250 1.6994 -5.095 40.0321 2.1746
150 1.67 -3.4 36.4 2.2815
MVDR
250 1.7379 -5.0356 40.0647 2.1753

4.4.3 Experiments on Perceptual Evaluation of Enhanced Speech

In this Section, we evaluate the proposed method by computing objective measures of per-
ceptual evaluation on enhanced speech. Here desired speaker and stationary noise source
pair is considered for evaluation. Six hundred sentences from TIMIT database [79] were se-
lected and randomized to perform the experiments. The objective measures for evaluating
speech quality used herein are, Log-Likelihood Ratio measure (LLR) [80], segmental SNR
4.4 Performance Evaluation 69

(segSNR) [80], Weighted-Slope Spectral (WSS) distance [81] and Perceptual Evaluation of
Speech Quality, PESQ [82]. The results are presented in Table 4.3, at two reverberation level,
T60 = 150 ms and 250 ms. PESQ and segSNR scores are high while LLR and WSS scores
are low for the proposed method, indicating better reconstruction of the signal.

Table 4.4: Comparison of distant speech recognition performance in terms of WER (in per-
centage) at various reverberation time, T60 .
ss1 ss2
T60 T60 T60 T60
Methods CTM
(150ms) (250ms) (150ms) (250ms)

MGD 12.98 23.96 11.99 23.58


MM 14.21 26.01 13.78 25.56
MONC BSM 9.2 15.02 27.99 15.22 27.32
LCMV 16.59 29.04 16.3 28.39
MVDR 17.04 30.16 16.96 29.86
MGD 8.81 15.79 9.16 16.02
MM 10.15 18.06 10.92 18.68
TIMIT BSM 6.73 10.98 19.16 12.1 20.12
LCMV 12.18 20.44 15.25 21.67
MVDR 14.08 22.47 17.41 24.37

4.4.4 Experiments on Distant Speech Recognition

Speaker independent large vocabulary speech recognition experiments are conducted for
speech acquired over circular microphone arrays [83, 84] in a meeting room scenario. The
experimental results are presented as word error rate (WER). The WER is calculated as

(Wn − (Ws + Wd + Wi ))
W ER = 100 − · 100
Wn

where Wn is the total number of words, Ws the total number of substitutions, Wd the total
number of deletions, and Wi the total number of insertions.
4.5 Summary and Contributions 70

To ensure conformity with standard databases, sentences from TIMIT database [79] were
selected. Continuous digit recognition experiments were conducted on MONC [85] database.
Separate set of sentences were used for training and testing. For TIMIT, complete test set of
1344 sentences from 112 male and 56 female were used. The rest were used for training the
speech models. For MONC, the speech models were trained with 8400 isolated and continuous
digit sentences. For testing, 650 continuous digit sentences were used. Three states, eight
mixture HMMs (Hidden Markov Models) were used in the experiments on TIMIT database.
For the experiments on MONC database three states, sixteen mixture HMMs were used.
Table 4.4 lists WER for various methods along with close talking microphone (CTM) as the
benchmark. The MUSIC-Group delay method indicates reasonable reduction in WER when
compared to other methods.

4.5 Summary and Contributions

In this Chapter, a novel high resolution source localization method based on the MUSIC-
Group delay spectrum is discussed. The method provides robust azimuth and elevation
estimates of closely spaced sources as indicated by source localization experiments when
compared to conventional source localization methods. The significance of the MUSIC-Group
delay method in speech enhancement and distant speech recognition is also illustrated from
improvements in signal to interference ratios and lower word error rates.
Chapter 5

Far-field Source Localization over


Spherical Microphone Array

5.1 Introduction

After the introduction of higher order spherical microphone array (SMA) and associated
signal processing in [10, 11], the spherical microphone array is widely being used for direction
of arrival (DOA) estimation [12, 13, 14, 15, 16, 17, 86], tracking of acoustic sources [33]
and sound field decomposition [87]. The growing research interest in spherical microphone
array can be attributed to the ability of such arrays to measure and analyze three-dimensional
sound fields in an effective manner. In other words, SMA can localize sound sources anywhere
in space. Additionally, the beampattern can be steered to any direction in three-dimensional
(3-D) space without changing the shape of the pattern. Hence, a spherical microphone array
allows full 3-D control of the beampattern. Another point of such array is the ease of array
processing in spherical harmonics (SH) domain.
In this chapter, novel far-field source localization methods over a spherical microphone
array are presented. The chapter starts with a discussion on fundamentals of spherical array
processing. Spherical Fourier transform and beampattern analysis in spherical harmonics do-
main are introduced. Development of far-field array data model from spatio-temporal domain
to spherical harmonics domain follows. Thereafter, formulation for the existing conventional
5.2 Fundamentals of Spherical Array Processing 72

source localization methods, spherical harmonics minimum variance distortionless response


(SH-MVDR) [16] and spherical harmonics MUltiple SIgnal Classification (SH-MUSIC) [16, 15]
is presented. Finally, a high resolution source localization method for spherical microphone
array is proposed using spherical harmonics MUSIC-Group delay (SH-MGD) spectrum. Sev-
eral experiments are conducted for 3-D source localization in noisy and reverberant envi-
ronments. Additional experiments on source tracking are also conducted. The performance
of the SH-MGD method is compared to other conventional methods in performance evalu-
ation section. Root mean square error (RMSE), probability of resolution and average error
distribution (AED) are utilized for evaluating the proposed method.

5.2 Fundamentals of Spherical Array Processing

In this Section, a background to spherical array signal processing is presented. Spherical


Fourier transform (SFT) is essential component for spherical array signal processing in spher-
ical harmonics domain. To start with, two assumptions are made. It is assumed that the
sound pressure on entire sphere is known. This assumption is not true in practice, and the
pressure is sampled spatially using microphones. Sampling weights based on certain sampling
criteria are introduced to take this into account [88]. It is also assumed that the sound field is
composed of plane waves. This is approximately true when the sound has traveled sufficient
distance from the source.

5.2.1 The Spherical Fourier Transform

Let us consider a spherical microphone array with I identical and omnidirectional micro-
phones, mounted on the surface of a sphere with radius ra . The position vector of ith
microphone is given by
� �T
ri = ra sin θi cos φi ra sin θi sin φi ra cos θi (5.1)

where θ is elevation angle, φ is azimuth angle and (.)T denotes transpose of (.). The spherical
microphone array is assumed to be of order N . The order of the array is defined in the
Section 2.5.1. Spherical Fourier transform under the aforementioned definition of spherical
microphone array is now detailed.
5.2 Fundamentals of Spherical Array Processing 73

Figure 5.1: Computation of spherical Fourier transform over sphere with radius r = 1

Let the pressure received at (r, Φ) = (r, θ, φ) be denoted by p(t, r, Φ) ↔ P (k, r, Φ) with r ≥
ra and k is wavenumber. The spherical Fourier transform (SFT) [89] or spherical harmonics
decomposition [15] of the received pressure is

Pnm (k, r) = P (k, r, Φ)[Ynm (Φ)]∗ dΩ (5.2)
Ω∈S 2

where Ynm is spherical harmonics of order n and degree m, dΩ = sin θdθdφ is elemental area
over sphere of unit radius as shown is Figure 5.1, and (.)∗ denotes complex conjugate of (.).
The spherical harmonics Ynm can be written from Section 2.5 as

(2n + 1)(n − m)! m
Ynm (Φ) = Pn (cosθ)ejmφ (5.3)
4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n

with Pnm being the associated Legendre functions. Substituting for dΩ, the SFT can be
expressed as
� 2π � π
Pnm (k, r) = P (k, r, Φ)[Ynm (Φ)]∗ sin(θ)dθdφ. (5.4)
0 0

In practice, the pressure received is not continuous. It is spatially sampled at the microphone
locations. Hence, SFT of pressure is approximated by a summation as
I

Pnm (k, r) ∼
= ai Pi (k, r, Φi )[Ynm (Φi )]∗ . (5.5)
i=1
5.2 Fundamentals of Spherical Array Processing 74

In matrix form for all n ∈ [0, N ], m ∈ [−n, n] and I, the SFT becomes

Pnm (k, r) ∼
= YH (Φ)ΓP(k, r, Φ) (5.6)
� �T
where Pnm = P00 P1(−1) P10 P11 · · · PN N is (N + 1)2 × 1 matrix and Y(Φ) is
I × (N + 1)2 matrix whose ith row is given as
� �
y(Φi ) = Y00 (Φi ) Y1−1 (Φi ) Y10 (Φi ) Y11 (Φi ) . . . YNN (Φi ) . (5.7)

Γ = diag(a1 , a2 , · · · , aI ) is I × I matrix of sampling weights [88]. P(k, r, Φ) is I × 1 matrix


of pressure at I microphones and (.)H denotes conjugate transpose of (.).
The inverse spherical Fourier transform relation is given by
N �
� n
P (k, r, Φ) ∼
= Pnm (k, r)Ynm (Φ). (5.8)
n=0 m=−n

Observing Equations 5.7 and 5.8, P (k, r, Φ) of the highest order N on a surface of a
sphere, has (N + 1)2 independent harmonic components. Hence, we can sample a sound field
of order N , with at least (N + 1)2 points on the sphere without losing the information. In
other words, the number of microphone can be [11]

I ≥ (N + 1)2 . (5.9)

5.2.2 Beampattern Analysis in Spherical Harmonics Domain

Beampatterns for uniform linear array (ULA) and uniform circular array (UCA) is given in
Section 3.4.2.3. As discussed therein, beampattern is typically measured as the array response
to a single plane wave. Hence, consider a sound field composed of single plane wave with
unit amplitude incident from direction Ψl = (θl , φl ). In this case, utilizing Equation 2.50 and
definition of inverse SFT, Pnm can be written,

Pnm (k, r) = bn (k, r)[Ynm (Ψl )]∗ (5.10)

where bn (k, r) is called mode strength. The expression and significance of mode strength
for open sphere and rigid sphere, is detailed in Section 2.5.1. By definition, the expression
in Equation 5.10 can be regarded as steering vector component in spherical harmonics do-
main. This is shown mathematically in Equation 5.34. Hence, similar to Equation 3.46, the
5.2 Fundamentals of Spherical Array Processing 75

(a) (b)

(c) (d)
Figure 5.2: Illustration of the spherical harmonics beampatterns (a) regular beampattern for
order N = 3, (b)regular beampattern for order N = 4 (c) DSB beampattern for order N = 3
and, (d) DSB beampattern for order N = 4

expression for beampattern in spherical harmonics domain, can be written as


�� n
N � �
� ∗ �
G=� Wnm (k)Pnm (k, r)� (5.11)
n=0 m=−n

∗ (k) is the complex conjugate of SFT of beamforming weights and |(.)| is absolute
where Wnm
value of (.). In matrix form,
� H �
G = �Wnm (k)Pnm (k, r)� (5.12)

where Wnm = [W00 , W1(−1) , W10 , W11 , · · · , WN N ]T .


For beampatterns that are rotationally symmetric around array look direction, the beam-
forming weight is given by [90]

∗ dn m
Wnm (k) = [Y (Ψs )] (5.13)
bn n

where dn controls the beampattern and Ψs is array look direction (also called steering direc-
tion) [10]. Utilizing Equations 5.13 and 5.10 in 5.11, the beampattern expression becomes

�� n
N � �
� �
G(Ψl , Ψs ) = � dn Ynm (Ψs )[Ynm (Ψl )]∗ �. (5.14)
n=0 m=−n
5.3 Microphone Array Data Model in Spherical Harmonics Domain 76

where Ψl is varied in the field of view of the array to get the array response for beampattern.
Spherical harmonics addition theorem [91] suggests

��N
2n + 1 �
� �
G(Θ) = � dn Pn (cos Θ)� (5.15)

n=0

where Pn (.) is Legendre polynomial, and Θ is the angle between source direction and the
array look direction.
Various choice of dn leads to different beampatterns. The beampattern achieved using
dn = 1 is called regular beampattern [92]. For delay sum beampattern, the beampattern
controlling parameter takes values as dn = |bn (k, r)|2 [50]. Regular and delay-and-sum beam-
patterns are shown in Figure 5.2.

5.3 Microphone Array Data Model in Spherical Harmonics


Domain

In this Section, data model for received pressure is derived in spherical harmonics domain.
The spatio-temporal data model was derived in Section 3.3. The spatio-temporal data model
is used herein to derive spatio-frequency and subsequently the spherical harmonics data
model.

5.3.1 Data Model in Spatial Domain

Let us consider L narrowband, far-field sources incident over a spherical microphone array
with I microphones. The microphones are mounted on the surface of a sphere with radius ra .
The amplitude of lth source is given by sl (t). The time delay of arrival at center of the sphere
is taken to be zero. For far-field source and omnidirectional sensor assumptions, the pressure
at ith microphone due to lth source will be sl (t − τi (Ψl )), where τi (Ψl ) is the propagation
delay between the reference point and the ith microphone for the lth source impinging from
direction Ψl . Hence, the total pressure at the ith microphone, can be expressed as
L
� � �
pi (Ψ; t) = sl t − τi (Ψl ) + vi (t) (5.16)
l=1
5.3 Microphone Array Data Model in Spherical Harmonics Domain 77

where vi (t) is the sensor noise at ith sensor and t = 1, 2, · · · , Ns , with Ns being the snapshots.
The data model in Equation 5.16 is referred to spatio-temporal data model.
Suppose that the microphone output pi (Ψ; t) is sampled with sampling frequency of 1/Ts
Hz. In general, if s(t) is band-limited to the interval [fl , fu ], then fu ≤ 1/2Ts . Computing
the discrete Fourier transform (DFT) of Equation 5.16, the spatio-frequency data model can
be written as
L

Pi (Ψ; fν ) = e−j2πfν τi (Ψl ) Sl (fν ) + Vi (fν ), ν = 1, · · · , Ns . (5.17)
l=1

where the frequency fν is related to FFT index ξν of DFT as

ξν
fν = (5.18)
Ts Ns

Utilizing ωτi (Ψl ) = kTl ri from Equation 2.12 and dropping ν for notational simplicity, the
Equation 5.17 can be re-written in wavenumber (hence frequency) domain as
L
� Tr
Pi (Ψ; k) = e−jkl i
Sl (k) + Vi (k). (5.19)
l=1

Rearranging Equation 5.19 in matrix form, the final data model in spatial domain can be
written as
P(Ψ; k) = A(Ψ; k)S(k) + V(k), (5.20)

where A(Ψ; k) is I × L steering matrix, S is L × Ns signal matrix and V is I × Ns matrix


of uncorrelated sensor noise. The noise components are assumed to be white, circularly
Gaussian distributed with zero mean and covariance matrix σ 2 I, I being the identity matrix.
The steering matrix can be expanded as

A(Ψ; k) = [a1 , a2 , . . . , aL ], where


Tr Tr Tr
al = [e−jkl 1
, e−jkl 2
, . . . , e−jkl I
]T . (5.21)

It is to be noted that the spatio-frequency data model in Equation 5.20 is similar to spatio-
temporal data model as derived in Equation 3.9.
5.3 Microphone Array Data Model in Spherical Harmonics Domain 78

5.3.2 Data Model in Spherical Harmonics Domain

Motivation to work in spherical harmonics domain comes from reduced dimensionality and
Tr
ease of array processing. Recollecting from Section 2.5.1 that each term e−jkl i represents
plane wave model in spherical coordinate system. Hence, the steering vector component for
spherical microphone array in spatial domain can be written from Equation 2.50 as
N �
� n
−jkT
ail = e l ri = bn (k, r)[Ynm (θl , φl )]∗ Ynm (θi , φi ) (5.22)
n=0 m=−n

From Equations 5.22 and 5.21, the spatial steering matrix can be written in terms of spherical
harmonics as
A(Ψ; k) = Y(Φ)B(k, r)YH (Ψ) (5.23)

where Y(Φ) a I × (N + 1)2 matrix, whose ith row defined in Equation 5.7, is collection of all
the spherical harmonics. The L × (N + 1)2 matrix Y(Ψ) can be expanded on similar lines
by replacing Φi with Ψl in Equation 5.7. The (N + 1)2 × (N + 1)2 matrix B(k, r) is given by

� �
B(k, r) = diag b0 (k, r), b1 (k, r), b1 (k, r), b1 (k, r), . . . , bN (k, r) . (5.24)

A particular mode strength bn (k, r) of order n is defined for open sphere and rigid sphere in
Equation 2.51.
Substituting expression for steering matrix from (5.23), in Equation 5.20, multiplying
both sides by YH (Φ)Γ and utilizing Equation 5.6, the data model becomes

Pnm (Ψ; k) = YH (Φ)ΓY(Φ)B(k, r)YH (Ψ)S(k) + Vnm (k) (5.25)

Orthogonality of spherical harmonics under spatial sampling suggests [88]

YH (Φ)ΓY(Φ) ∼
= I. (5.26)

Hence, the data model in spherical harmonics domain turns out to be

Pnm (Ψ; k) = B(k, r)YH (Ψ)S(k) + Vnm (k) (5.27)

It is to be noted that B(k, r) is a constant for a given array geometry and frequency of
operation. It is invertible for array geometry like rigid and dual [16]. Hence, multiplying
5.4 Advantage of Array Data Model Formulation in Spherical Harmonics
Domain 79

both side of Equation 5.27 by B−1 (k, r) we have the final data model in spherical harmonics
domain as,

Dnm (Ψ; k) = YH (Ψ)S(k) + Znm (k)

[Dnm ](N +1)2 ×Ns = [YH ](N +1)2 ×L [S]L×Ns + [Znm ](N +1)2 ×Ns (5.28)

where

Dnm (Ψ; k) = B−1 (k, r)Pnm (Ψ; k) (5.29)

Znm (k) = B−1 (k, r)Vnm (k) = η(k)V(k) (5.30)

η(k) = B−1 (k, r)YH (Φ)Γ (5.31)

It must be noted that η(k) is known for a given array geometry and frequency of operation.
Comparing the spatio-frequency data model in Equation 5.20 and spherical harmonics
data model in Equation 5.28, the steering matrix in spherical harmonics domain turns out
to be Anm (Ψ) = YH (Ψ). Hence, a particular steering vector can be written as

∗ ∗ ∗ ∗ ∗
anm (Ψl ) = yH (Ψl ) = [Y00 (Ψl ), Y1−1 (Ψl ), Y10 (Ψl ), Y11 (Ψl ), . . . , YNN (Ψl )]T . (5.32)

One component of steering vector can be expressed as


anm = Ynm (Ψl ). (5.33)

For data model in Equation 5.27, the steering vector component will be


anm = bn (k, r)Ynm (Ψl ) (5.34)

5.4 Advantage of Array Data Model Formulation in Spherical


Harmonics Domain

Formulation of various problems in spatial domain and spherical harmonics domain is similar
[50]. Hence, the results of the spatial domain can directly be applied in the spherical har-
monics domain. Additionally, array processing in spherical harmonics domain has got some
exclusive advantages over spatial domain.
5.4 Advantage of Array Data Model Formulation in Spherical Harmonics
Domain 80

5.4.1 Reduced Dimensionality

Observing the spatio-frequency data model in Equation 5.20 and spherical harmonics data
model in Equation 5.28, we conclude that the dimensionality of data is reduced from I to (N +
1)2 , as indicated by the relation in Equation 5.9. This is achieved by simple multiplication
B−1 (k, r)YH (Φ)Γ to spatio-frequency data model. Hence, spherical harmonics formulation
is computationally more efficient.

5.4.2 Frequency Smoothing

The steering matrix in spherical harmonics domain assumes the form as YH (Ψ), which is
frequency independent. Due to frequency independent nature of steering matrix, frequency
smoothing can be performed which restore the rank of signal covariance matrix [16]. Subspace
based method MUSIC requires full rank of signal covariance matrix, which is not possible
when sources are correlated. MVDR requires full rank of array covariance matrix, as it
involves inverting of the covariance matrix.
Utilizing the spherical harmonics data model in Equation 5.28, the model array covariance
matrix can be written as

RDnm (k) = E[Dnm (k)Dnm (k)H ]

= YH (Ψ)RS (k)Y(Ψ) + RZnm (k) (5.35)

where RS (k) = E[S(k)SH (k)] is the signal covariance matrix. Utilizing Equation 5.30,the
model noise covariance matrix is given as RZnm (k) = σ 2 η(k)η H (k). It can be noted from
Equation 5.35 that the frequency smoothing of model array covariance matrix can be per-
formed by averaging RDnm over frequency which will smooth the signal covariance matrix
as steering matrix is frequency independent. However, it is not possible to perform rank
restoration of covariance matrix in spatial data model as the steering matrix is frequency
dependent. Frequency smoothed covariance matrix can be written as
Ns
1 �
R̃Dnm = RDnm (kν )
Ns
ν=1

= YH (Ψ)R̃S Y(Ψ) + σ 2 Σ (5.36)


5.5 Far-field Source Localization using Spherical Microphone Array 81

where
Ns
1 �
Σ= η(kν )η H (kν ) (5.37)
Ns
ν=1
Ns
1 �
R̃S = RS (kν ) (5.38)
Ns
ν=1

Theoretically, it is sufficient to restore the rank of model covariance matrix up to maximum


rank of signal covariance matrix which is L. Therefore, averaging across L frequencies may
be sufficient. However, in practice, averaging is done with larger number of frequencies to
improve the estimation.

5.4.3 Ease of Beamforming

Processing in spherical harmonics domain provides ease of beamforming, due to reduced di-
mensionality of array covariance matrix and simple structure of steering vector component.
Here, we present the weights for MVDR beamforming. The problem formulation for beam-
forming weights is similar to spatial domain as presented in Section 3.4.2.2. The MVDR
based beamforming problem formulation is given by

min Wnm H RDnm Wnm subject to Wnm H anm = 1 (5.39)


Wnm

The solution to the given optimization problem is [93]


R−1
Dnm anm
Wnm = . (5.40)
anm H R−1
Dnm anm

It is to be noted that Wnm and RDnm are of lower dimension when compared to their spatial
domain counterpart. Also, comparing Equations 5.33 and 5.22, the steering vector component
has simple form in spherical harmonics domain while it involves double summation in the
case of spatial domain.

5.5 Far-field Source Localization using Spherical Microphone


Array

The data model in Equation 5.28, corresponds to spherical harmonics data model for sources
in far-field, and is utilized herein for source localization. Formulation for spherical harmonics
5.5 Far-field Source Localization using Spherical Microphone Array 82

MVDR (SH-MVDR) and spherical harmonics MUSIC (SH-MUSIC) are presented for source
localization. Subsequently, a high resolution source localization method, spherical harmonics
MUSIC-Group delay (SH-MGD) is proposed.

5.5.1 Spherical Harmonics MVDR Method

Utilizing the weights as defined in Equation 5.40, the power spectrum of MVDR in spherical
harmonics domain, can be written as [16]

1
PSH−M V DR (Ψ) = . (5.41)
anm H (Ψ)R−1
Dnm anm (Ψ)

The DOA estimates are given as L largest peaks in SH-MVDR power spectrum corresponding
to L sources. As a spatial filter, SH-MVDR steered to certain DOA Ψs , attenuates any other
signal impinging on the array from DOA �= Ψs . The performance of SH-MVDR is limited
when the sources are closely spaced. This is illustrated is Figure 5.3. SH-MVDR spectrum
is unable to resolve two closely space sources with location (20◦ ,50◦ ) and (15◦ ,60◦ ) at SNR
10 dB. An open sphere is taken for simulation.

300

250

200
SH−MVDR

150

100
P

50

0
100
50
Elevation(θ) 150 200
0 50 100
0 Azimuth(φ)

Figure 5.3: SH-MVDR spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB
5.5 Far-field Source Localization using Spherical Microphone Array 83

5.5.2 Spherical Harmonics MUSIC Method

The MUSIC-Magnitude spectrum in spherical harmonics domain (SH-MUSIC) is formulated


as [16]
1
PSH−M U SIC (Ψ) = (5.42)
anm H (Ψ)Qnm Qnm H anm (Ψ)
where anm (Ψ) is a steering vector defined in Equation 5.32, and Qnm is noise subspace ob-
tained from eigenvalue decomposition of modal covariance matrix RDnm computed in Equa-
tion 5.35. The denominator takes zero when Ψ corresponds to DOA owing to orthogonality
between noise eigenvector and steering vector. Hence, we get a peak in SH-MUSIC spectrum.
However, when sources are closely spaced, SH-MUSIC spectrum is unable to resolve them
accurately giving many spurious peaks. Figure 5.4 illustrates the SH-MUSIC spectrum for
an Eigenmike system [39]. The simulation is done considering open sphere for the sources at
(20◦ ,50◦ ) and (15◦ ,60◦ ) with SNR = 10 dB.

250

200
PSH−MUSIC

150

100

50

0
100

50

Elevation(θ) 0 80 100 120 140 160 180


0 20 40 60
Azimuth(φ)

Figure 5.4: SH-MUSIC spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB

5.5.3 Spherical Harmonics MUSIC-Group Delay Method

The SH-MUSIC spectrum gives many spurious peaks for closely spaced sources and hence
determining the candidate peak becomes challenging. It is to be noted from SH-MUSIC
expression that it utilizes magnitude spectra of anm H (Ψ)Qnm . To overcome the limitation
of SH-MUSIC, MUSIC-Group delay is formulated in spherical harmonics domain. This is
5.5 Far-field Source Localization using Spherical Microphone Array 84

called spherical harmonics MUSIC-Group delay. It utilizes the differential of phase spectrum
(group delay) of SH-MUSIC. The spherical harmonics formulation of MUSIC-Group delay is
given by
U
�� �
PSH−M GD (Ψ) = |∇arg(anm H (Ψ)qu )|2 PSH−M U SIC (Ψ) (5.43)
u=1

where U = (N + 1)2 − L, ∇ is the gradient operator, arg(.) indicates unwrapped phase, and
qu represents the uth eigenvector of the noise subspace, Qnm . The first term within (.) is the
group delay spectrum. The gradient is taken with respect to the spatial variable Ψ = (θ, φ).
The SH-MGD being the product spectrum removes the spurious peaks. The prominent peaks
corresponding to DOAs are retained as illustrated in Figure 5.5. In addition, the group delay
of MUSIC follows additive property, that enables the group delay spectrum to better preserve
the peaks than the magnitude spectrum following multiplicative property. A mathematical
proof for additive property of spatial domain MUSIC-Group delay spectrum is provided in
Section 4.2.3. All the spatial domain results are also valid in spherical harmonics domain [50].
Hence, the additive property of group delay spectrum also holds in the spherical harmonics
domain.
4
x 10
6

4
PSH−MGD

0
100
50

Elevation(θ) 0 60 80 100 120 140 160 180


0 20 40
Azimuth(φ)

Figure 5.5: SH-MGD spectrum for sources at (20◦ ,50◦ ) and (15◦ ,60◦ ), SNR=10 dB
5.6 Formulation of Stochastic Cramér-Rao Bound for Far-field Sources 85

5.5.4 Noise Whitening

Formulation of MUSIC requires noise to be spatially white. For spatially white noise, the
modal noise covariance matrix RZnm (k), is not spatially white. Hence, whitening is required
before applying SH-MUSIC or SH-MGD. Whitening the smoothed model array covariance
matrix results in covariance matrix as [16],
−1/2
R̃w
Dnm = Σ R̃Dnm Σ−1/2 , (5.44)

and whitened steering vector as

anm (Ψl ) = Σ−1/2 yH (Ψl ) (5.45)

where Σ is computed from Equation 5.37. The new model array covariance matrix and
steering vector is used in the computation of SH-MUSIC and SH-MGD.

5.6 Formulation of Stochastic Cramér-Rao Bound for Far-


field Sources

Cramér-Rao bound (CRB) places a lower bound on the variance of an unbiased estimator.
It provides a benchmark against which any estimator is evaluated. Although various source
localization algorithms have been proposed in spherical harmonics domain [13, 14, 15, 16,
17, 86], literature on Cramér-Rao bound in spherical harmonics domain is rare. Hence, it is
of sufficient interest to develop an expression for Cramér-Rao bound in spherical harmonics
domain.
In [75], CRB expression was derived for the case of ULA but without using the theory
of CRB. This is addressed in [94], which provides a textbook derivation for stochastic CRB.
Explicit CRBs of azimuth and elevation are developed in [95, 68] for planar arrays. CRB
analysis is presented for near-field source localization in [96, 97] using ULA and UCA respec-
tively. In [98], closed-form CRB expressions has been derived for 3-D array made from ULA
branches. The formulations developed in previous work, makes use of the standard spatial
data model.
We will make use of transformed data model, Equation 5.28 as our observation. Under
stochastic assumption, the unknown signal S(k) is taken to be circularly Gaussian distributed
5.6 Formulation of Stochastic Cramér-Rao Bound for Far-field Sources 86

with zero mean. The parameter vector will include the DOAs, signal covariances and the noise
variance. However, DOAs are usually the parameters of interests in array signal processing.
A closed-form expression for stochastic CRB(DOA) is presented herein. Hence, the unknown
direction parameter vector taken here is

α = [θ T φT ]T (5.46)

where θ = [θ1 · · · θL ]T and φ = [φ1 · · · φL ]T .

5.6.1 Existence of the Stochastic CRB in Spherical Harmonics Domain

The existence of the stochastic CRB is first validated for spherical harmonics data model. In
this context, the probability density function (PDF) of the observed data model is proved to
satisfy the regularity condition. Mean of the observation from Equations 5.28 and 5.30 under
stochastic signal assumption is

E[Dnm (Ψ; k)] = YH (Ψ)E[S(k)] + η(k)E[V(k)] = 0.

The covariance matrix of the observation can be written as

RDnm (k) ≡ RD = E[Dnm (k)Dnm (k)H ] = YH (Ψ)RS (k)Y(Ψ) + σ 2 C (5.47)

where C = ηη H . RDnm is replaced with RD for notational simplicity. Hence, for the
2
observation Dnm (k) ∈ C(N +1) with Dnm ∼ N (0, RD ), the probability density function
(likelihood function) can be written as [99, p. 502],
1 � �
p(Dnm (k); α) = exp − Dnm H RD −1 Dnm (5.48)
π (N +1)2 |R D|

where |.| denotes the determinant.


Utilizing Dnm H RD −1 Dnm = tr{Dnm Dnm H RD −1 }, the log-likelihood function can be
written as
ln p(Dnm (k); α) = K0 − ln |RD | − tr{Dnm Dnm H RD −1 } (5.49)

where K0 is a constant and tr{.} denotes trace of matrix {.}. According to the CRB theorem
[100], if the likelihood function satisfies the regularity conditions
� �
∂ ln p(Dnm (k); α)
E =0, (5.50)
∂α
5.6 Formulation of Stochastic Cramér-Rao Bound for Far-field Sources 87

then the variance of any unbiased estimator for rth parameter αr , follows the inequality

var(α̂r ) ≥ [F −1 (α)]rr (5.51)

where the Fisher information matrix F (α) is given by


� 2 �
∂ ln p(Dnm (k); α)
F (α)rs = −E . (5.52)
∂αr ∂αs
∂ ln |RD | ∂RD −1
Having the identity, ∂α = tr{RD −1 ∂R
∂α },
D
∂α = −RD −1 ∂R
∂α RD
D −1
and knowing the
fact that expectation and trace operation commute, it can be shown that given likelihood
function satisfies the regularity conditions.

5.6.2 CRB Analysis in Spherical Harmonics Domain

For developing the CRB expression, Fisher information matrix is obtained first. The steps
involved in obtaining Fisher information matrix from Equation 5.28, are detailed in Appendix
A.1. The final expression for Fisher information matrix block are :
� �
Fθφ = 2Re (RS YRD −1 YH RS )T � (Ẏθ RD −1 Ẏφ
H
) + (RS YRD −1 Ẏθ ) � (RS YRD −1 Ẏφ
H T H
)
� �
Fθθ = 2Re (RS YRD −1 YH RS )T � (Ẏθ RD −1 Ẏθ
H
) + (RS YRD −1 Ẏθ ) � (RS YRD −1 Ẏθ
H T H
)

where � is Hadamard product, Y represents Y(Ψ) and vector derivative of steering matrix
YH (Ψ) is defined as
L

H
Ẏθ = ẎθHr
r=1
∂YH
ẎθHr = .
∂θr
∂Y H ∂Y H
The steps involved in computation of ∂θr and ∂φr is detailed in Appendix A.2. Fφφ and
Fφθ can be expressed in a similar manner. The Fisher Information matrix is finally given by
 
Fθθ Fθφ
F = .
Fφθ Fφφ

Now the closed-form CRB can be computed using Equation 5.51.


Behavioral study of the stochastic CRB at various SNRs and snapshots is presented for
Eigenmike microphone array [39]. The order of the array was taken to be N = 3. The
5.7 Performance Evaluation 88

signal and noise are taken to be Gaussian distributed with zero mean. A source with DOA
(20◦ , 50◦ ) is considered. Two sets of simulations are conducted with 500 independent trials.
In the first set, simulation is conducted for 300 snapshots, at various SNRs. In the second
set, CRB is computed for various snapshots at SNR of 20dB. The CRB for azimuth and
elevation is plotted in Figure 5.6. It can be noted that a lower bound on CRB is attained at
higher SNR. A similar observation is made when larger number of snapshots are used.
−3
x 10
5
CRB(θ)
CRB(φ)
4

3
CRB

0
0 2.5 5 7.5 10 12.5 15 17.5 20
SNR(dB)

(a)
−4
x 10
1.4
CRB(θ)
1.2 CRB(φ)

0.8
CRB

0.6

0.4

0.2

0
50 75 100 125 150 175 200 225 250 275 300
Snapshots

(b)

Figure 5.6: Variation of CRB for elevation (θ) and azimuth (φ) estimation (a) at various SNR
with 300 snapshots, (b) with varying snapshots at SNR 20dB. Source is located at (20◦ , 50◦ ).

5.7 Performance Evaluation

Experiments on source localization and source tracking [101] are performed to evaluate the
proposed SH-MGD method. Experiments on source localization are presented as cumulative
5.7 Performance Evaluation 89

root mean square error (RMSE) for noisy and reverberant environments. Statistical analy-
sis of the proposed method for source localization is presented as probability of resolution.
Probability of resolution is measure of ability to resolve the sources within a confidence in-
terval. Additionally, narrowband source tracking results are also discussed. Tracking results
are presented as the estimated two dimensional trajectory of the elevation angle for a fixed
azimuth. The proposed method is compared with SH-MUSIC and SH-MVDR. An Eigenmike
microphone array [39] is utilized in the experiments. It consists of 32 microphones embedded
in a rigid sphere of radius 4.2 cm.

5.7.1 Experiments on Far-field Source Localization in Noisy Environments

Two far-field sources at locations (30◦ , 35◦ ) and (50◦ , 60◦ ) are considered. A fourth order
Eigenmike system is utilized for localization experiments. The azimuth and elevation of the
sources are estimated using SH-MGD, SH-MUSIC and SH-MVDR at various SNRs. Two
hundred independent trials are performed and the locations of the sources are estimated.
The results are presented as cumulative root mean square error. The cumulative RMSE is
defined as
T 2
1 �� (t) (t)
RM SE = [(θl − θ̂l )2 + (φl − φ̂l )2 ], (5.53)
4T
t=1 l=1

where, t indicates trial number, T is the total trials and l denotes the source number. (θl , φl )
is the actual source location while (θ̂l , φ̂l ) is the corresponding estimates.
The cumulative RMSE is presented herein using bar plot as shown in Figure 5.7 for
various SNRs. It is to be noted that the proposed SH-MGD performs reasonably better
when compared to conventional methods like SH-MUSIC and SH-MVDR at low SNRs. High
error at low SNR is noted since, SH-MVDR is unable to resolve the two sources.

5.7.2 Experiments on Far-field Source Localization in Reverberant Envi-


ronment

To evaluate the proposed method for robustness under reverberation, source localization
experiments are conducted at various reverberation times T60 . A detailed discussion on
reverberation can be found in Section 3.3.2. A room with dimensions, 7.3m × 6.2m × 3.4m
5.7 Performance Evaluation 90

12
SH−MGD
SH−MUSIC
10
SH−MVDR

8
RMSE

0
5 10 15 20
SNR(dB)

Figure 5.7: Cumulative RMSE in source angle estimation at various SNRs for two hundred
iterations. The sources are located at (30◦ , 35◦ ) and (50◦ , 60◦ ).

is utilized in the experiments. The room impulse response for spherical microphone array is
generated as in [102].
The experiments are performed at various reverberation times (T60 ). Two far-field sources
with location (30◦ , 60◦ ) and (35◦ , 50◦ ) are considered. The order of the array is assumed to
be N = 3. Localization experiments are conducted for 300 iterations at three different
reverberation times, 150 ms, 200 ms and 250 ms. The experiment is repeated for three
methods, SH-MGD, SH-MUSIC and SH-MVDR. Results are presented as RMSE values in
Table 5.1. SH-MGD has reasonably lower RMSE than other conventional methods.

Table 5.1: Comparison of RMSE of various methods at different reverberation time (T60 ).
T60 T60 T60
Angle Method
(150ms) (200ms) (250ms)
SH-MGD 0.6403 0.6419 0.6475
θ SH-MUSIC 0.6688 0.8144 0.7989
SH-MVDR 1.1034 1.1579 1.1738
SH-MGD 1.4387 1.4665 1.4866
φ SH-MUSIC 1.7866 1.9127 1.6484
SH-MVDR 2.276 2.3481 2.4927
5.7 Performance Evaluation 91

5.7.3 Statistical Evaluation

In this Section, statistical evaluation of source localization methods is illustrated using prob-
ability of resolution at various SNRs. The probability of resolution is given by
T 2
1 ��� (t) (t) �
Pr = P r(|θl − θ̂l | ≤ ζ) + P r(|φl − φ̂l | ≤ ζ)
4T
t=1 l=1
�T � 2
1 � (t) (t) �
= sgn(ζ − |θl − θ̂l |) + sgn(ζ − |φl − φ̂l |) , (5.54)
4T
t=1 l=1

where ζ is confidence interval, P r(.) denotes the probability of an event, and sgn(x) is defined
as 
 1 if x ≥ 0
sgn(x) = (5.55)
 0 if x < 0
Two sources with locations (30◦ , 35◦ ) and (50◦ , 60◦ ) are considered. The confidence inter-
val is taken to be ζ = 3◦ . A 4th order spherical microphone array is used in the experiments.
The probability is calculated for two hundred independent trials. Results on probability of
resolution is listed in table 5.2 for various SNRs. At low SNR both SH-MGD and SH-MUSIC
outperforms SH-MVDR. The zero probability of resolution of SH-MVDR is because of its
inability to resolve sources at low SNR in the given confidence interval. At high SNRs, all
the methods provides reasonably similar performance.

Table 5.2: Probability of resolution at various SNRs for 200 iterations. Sources are taken at
(30◦ , 35◦ ) and (50◦ , 60◦ ).
SNR SNR SNR SNR
Methods
(5dB) (10dB) (15dB) (20dB)
SH-MGD 0.9167 0.9971 1 1
SH-MUSIC 0.9444 0.9829 0.9987 1
SH-MVDR 0 0 0.4179 1

5.7.4 Experiments on narrowband source Tracking

An application of the proposed method is illustrated using narrowband source tracking.


Source tracking is one of the major application of acoustic source localization in audio surveil-
5.7 Performance Evaluation 92

lance. In this section, elevation angle of a moving source is tracked. The source continuously
emits narrowband signal impinging on the spherical array. The azimuthal angle (φ) of the
source is fixed at 45◦ and the elevation angle is varied as the trajectory given in Figure 5.8.

80

60
Elevation(θ)

40

20

0
0 5 10 15 20 25
Time(sec)

Figure 5.8: Trajectory of elevation angle (θ) followed by the moving source with time for a
fixed azimuth φ = 45◦ .

The elevation angle is tracked at fixed azimuth using SH-MGD and SH-MUSIC methods.
Figure 5.9(a) illustrates the tracked trajectory by SH-MUSIC method. It can be noted that
the trajectory is not estimated well due to spurious peaks that are present in SH-MUSIC
spectrum. Trajectory obtained from SH-MGD is shown in Figure 5.9(b). The estimated
trajectory is close to the actual trajectory indicating efficient tracking.

100 100

80 80
Elevation(θ)

Elevation(θ)

60 60

40 40

20 20

0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time(sec) Time(sec)

(a) (b)

Figure 5.9: Tracking result for elevation (a)SH-MUSIC and (b) SH-MGD. The azimuth is
fixed at 45◦ .
5.8 Summary and Contributions 93

The tracking experiment is repeated for 25 different trajectories. The elevation angle is
fixed at 90◦ . The azimuth angle is varied as different sinusoids, similar to one shown in Figure
5.8. An average error distribution (AED) of the tracking error is obtained. AED plots for
tracking error obtained from SH-MUSIC and SH-MGD, is illustrated in Figure 5.10. It may
be noted that error variance for SH-MGD is smaller than that of SH-MUSIC.

400
No. of cases

200

0
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
Error deviation in SH− MUSIC
400
No. of cases

200

0
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
Error deviation in SH−MGD

Figure 5.10: Average error distribution plot for tracking error using SH-MUSIC and SH-MGD
Method.

5.8 Summary and Contributions

In this chapter, a far-field data model in the spherical harmonics domain is formulated. Ad-
vantage of array processing in spherical harmonics domain is also detailed. A high resolution
source localization method called the spherical harmonics MUSIC-Group delay, for spherical
microphone array is proposed. Formulation and analysis of Cramér-Rao bound for far-field
sources is presented in spherical harmonics domain. Experimental results on multi-source
localization in noisy and reverberant environments, indicate the robustness of the method.
RMSE and statistical analysis is presented to evaluate the performance of source localiza-
5.8 Summary and Contributions 94

tion methods. Experiments on tracking a single source are motivating enough to extend this
approach to track multiple sources in real time that are closely spaced in a Kalman filter
framework.
Chapter 6

The Spherical Harmonics


root-MUSIC

6.1 Introduction

Accurate and search free algorithms for direction of arrival (DOA) estimation has been a very
active area of research in source localization. Root-MUSIC (MUltiple SIgnal Classification)
[20] and Estimation of Signal Parameters using Rotational Invariance Techniques, ESPRIT
[21], fall under this category. As discussed in Section 3.4.3.5, the root-MUSIC estimates
DOA as the roots of MUSIC polynomial owing to Vandermonde structure of array mani-
fold (steering vector) in case of uniform linear array (ULA). Such structure is not observed
in array manifold for uniform circular array (UCA) [48]. Zoltowski proposed beamspace
transformation based on the phase mode excitation, to get Vandermonde structure in array
manifold with respect to azimuth angle [22]. Hence, it enables to apply the root-MUSIC for
azimuth estimation at given elevation. The technique was further extended to sparse UCA
root-MUSIC which utilizes modified beamspace transformation in [103]. Another approach
for extending the ULA root-MUSIC for a planar array, is presented in [104] using manifold
separation. The idea of manifold separation is to write the planar array steering vector as
a product of a characteristic matrix of the array and a vector with Vandermonde structure
depending on the azimuth angle. The manifold separation utilizing spherical harmonics (SH)
6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 96

is introduced in [105] to unify different decompositions of the array manifold.


After the introduction of higher order spherical microphone array and associated signal
processing in [10, 11], various existing DOA estimation techniques have been reformulated
in the spherical harmonics domain. The element space MUSIC is implemented in terms of
spherical harmonics, called SH-MUSIC, in [15, 16]. The minimum variance distortionless
response (MVDR) spectrum in terms of spherical harmonics, SH-MVDR, is utilized for DOA
estimation in [16].
In this Chapter, SH-root-MUSIC (SH-RM), a polynomial rooting technique, for DOA
estimation using spherical microphone array is proposed. Root-MUSIC technique in general,
has low computational complexity because of the direct polynomial solution [106]. Also,
it provides exact solution and is not limited by the discretization issues associated with the
SH-MUSIC and SH-MVDR methods for DOA estimation. However, as in earlier work on root-
MUSIC for planar array [22, 103, 104], the proposed SH-root-MUSIC can estimate azimuth
at fixed elevation. This is because, all approaches to root-MUSIC induces Vandermonde
structure in azimuth. In the following Section, formulation of root-MUSIC and proof of
Vandermonde structure in azimuth is presented.

6.2 Formulation of root-MUSIC in Spherical Harmonics Do-


main

A spherical microphone array of order N , radius ra and number of sensors I is considered.


The order of spherical microphone array is defined in the Section 2.5.1. A sound field of L
plane-waves is incident on the array with wavenumber k. The lth source location is denoted
by Ψl = (θl , φl ) where θ is elevation angle and φ is azimuthal angle . Similarly, the ith sensor
location is given by Φi = (θi , φi ).
The spherical harmonics data model, from Equation 5.28 can be written as

Dnm (Ψ; k) = YH (Ψ)S(k) + Znm (k) (6.1)

where YH (Ψ), is (N +1)2 ×L steering matrix in spherical harmonics domain and (.)H denotes
6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 97

conjugate transpose of (.). A particular steering vector can be written as

∗ ∗ ∗ ∗ ∗
anm (Ψ) = yH (Ψ) = [Y00 (Ψ), Y1−1 (Ψ), Y10 (Ψ), Y11 (Ψ), . . . , YNN (Ψ)]T . (6.2)

where Ynm is spherical harmonics of order n and degree m, given by



m (2n + 1)(n − m)! m
Yn (Ψ) = Pn (cosθ)ejmφ (6.3)
4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n.

Ynm are solutions to the Helmholtz equation, Pnm are associated Legendre functions and (.)∗
denotes complex conjugate of (.).
SH-root-MUSIC estimates DOAs as roots of the SH-MUSIC polynomial. Hence, rewriting
the expression for SH-MUSIC spectrum from Equation 5.42, we have

1
PSH−M U SIC (Ψ) = H (Ψ)Q H
(6.4)
anm nm Qnm anm (Ψ)

where Qnm is noise subspace obtained from eigenvalue decomposition of modal array covari-
ance matrix RDnm . The model array covariance matrix is written as

RDnm (k) = E[Dnm (k)Dnm (k)H ]

= YH (Ψ)RS (k)Y(Ψ) + RZnm (k) (6.5)

where RS (k) = E[S(k)SH (k)] is the signal covariance matrix.

Figure 6.1: Plot of SH-MUSIC illustrating DOA estimation using fourth order Eigenmike
system. Sources are located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB.
6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 98

The SH-MUSIC plot is shown in Figure 6.1 for two sources at azimuth (40◦ ,70◦ ) and
co-elevation 20◦ . The two peaks correspond to the two sources. It is to be noted that SH-
MUSIC spectrum needs human intervention or a comprehensive search algorithm to estimate
the DOA of the desired source. The resolution is also limited by the discretization at which
the spectrum is evaluated. The SH-root-MUSIC overcomes these limitations in estimating
the DOA. However, for root-MUSIC to be applicable in spherical harmonics domain, the
Vandermonde structure in spherical harmonics steering vector need to be shown.
Vandermonde structure in spherical harmonics steering vector is illustrated herein using
manifold separation technique. Utilizing the Equations 6.2 and 6.3, the steering vector for
co-elevation θ0 , can be written in more compact form as

yH (Ψ) = yH (θ0 , φ)

= [f00 , f1(−1) ejφ , f10 , f11 e−jφ , · · · , fN N e−jN φ ]T (6.6)



(2n + 1)(n − m)! m
where, fnm = Pn (cosθ0 ) (6.7)
4π(n + m)!

Re-writing the Equation 6.6 in matrix form,

yH (θ0 , φ) = F (θ0 )d(φ) (6.8)

where, F (θ0 ) = diag(f00 , f1(−1) , f10 , f11 , · · · , fN N ) (6.9)

d(φ) = [1, ejφ , 1, e−jφ , · · · , e−jN φ ]T (6.10)

The matrix d(φ) consists of only the exponent terms containing the azimuth angle. Each
submatrix of d(φ) corresponding to a particular order, follows Vandermonde structure. For
example, a submatrix of d(φ) corresponding to first order is [ejφ , 1, e−jφ ] that exhibits Van-
dermonde structure.
Utilizing Equations 6.2 and 6.8 in 6.4, the SH-MUSIC cost function can be written as

−1
PSHM (φ) = dH (φ)F H (θ0 )Qnm Qnm H F (θ0 )d(φ)

= dH (φ)F H (θ0 )CF (θ0 )d(φ) (6.11)

where, C = Qnm Qnm H

Substituting z = ejφ in Equation 6.11, the SH-MUSIC cost function assumes a form of
6.2 Formulation of root-MUSIC in Spherical Harmonics Domain 99

polynomial of degree 4N , given by


2N

−1
PSHM (φ) = Cu z u (6.12)
u=−2N

where the co-efficients Cu are obtained mathematically. The polynomial has 4N roots. How-
1
ever, these are not independent. If z is root of the polynomial then z∗ will also be the root.
Hence, 2N roots will be within the unit circle and 2N outside the unit circle. Of the 2N
roots within the unit circle, L roots close to unit circle correspond to the DOAs. This is
illustrated in Figure 6.2 for a spherical microphone array with order N = 4.

0.5
Imaginary Part

−0.5

−1
−1 −0.5 0 0.5 1
Real Part

Figure 6.2: Plot of SH-root-MUSIC illustrating the actual DOA estimates (red stars) and
noisy DOA estimates (blue triangles). A fourth order Eigenmike system is used. Sources are
located at (20◦ ,40◦ ) and (20◦ ,70◦ ) with SNR 15dB.

The roots are plotted for two sources with co-elevation angle 20◦ and azimuth angle
(40◦ ,70◦ ) at SNR 15dB. All the roots within and near unit circle are shown in the figure. The
DOA can be estimated from the roots by using the relation,

φ = �(ln(z)) (6.13)

where �() is imaginary part of ().


6.3 Performance Evaluation 100

6.3 Performance Evaluation

The performance of the proposed SH-root-MUSIC method is evaluated using experiments on


source localization. The first category of experiments provides results on source localization
as root mean square error (RMSE) at various SNRs. Additionally, statistical importance of
the method is shown based on probability of resolution at various SNRs.
An Eigenmike microphone array [39] is used for the simulation. It consists of 32 mi-
crophones embedded in a rigid sphere of radius 4.2 cm. The order of the array is taken to
be N = 4. Two sources with azimuth (40◦ , 80◦ ) and co-elevation 20◦ are considered. The
additive noise is assumed to be zero mean Gaussian distributed. A total of 500 independent
Monte Carlo trials are run for the RMSE and probability of resolution estimation.

6.3.1 Experiments on Source Localization

The experiments on source localization are presented as cumulative RMSE at various signal
to noise ratios (SNR). The proposed method is compared with other subspace-based meth-
ods SH-MUSIC and SH-MGD. The results are presented as cumulative RMSE for both the
sources. The RMSE is calculated as
T 2
1 �� (t)
RM SE = [(φl − φ̂l )2 ], (6.14)
2T
t=1 l=1

where, t indicates trial number and l denotes the source number. The RMSE values are given
in Table 6.1. The high RMSE for SH-root-MUSIC and SH-MUSIC at low SNR is because of
their inability to resolve the sources.

Table 6.1: Comparison of RMSE of various source localization methods at different SNR
SNR SNR SNR SNR SNR SNR
Method
(5dB) (10dB) (15dB) (20dB) (25dB) (30dB)
SH-MGD 2.999 1.848 1.490 1.366 1.360 1.381
SH-RM 8.283 3.308 0.997 0.873 0.662 0.470
SH-MUSIC 10.273 7.321 0.770 0.722 0.731 0.784
6.4 Summary and Contributions 101

6.3.2 Statistical Analysis

Statistical analysis of the proposed method is described in terms of probability of resolution


for various SNRs. The confidence interval of ζ = 10◦ is used while calculating the probability
over five hundred independent trials.

1
Probability of Resolution

0.8

0.6

0.4
SH−MGD
SH−RM
0.2 SH−MUSIC

0
5 7.5 10 12.5 15 17.5 20 22.5 25 27.5 30
SNR(dB)

Figure 6.3: Probability of resolution plot for two sources with azimuth (40◦ , 80◦ ) and co-
elevation 20◦ .

The probability of resolution is given by


T 2
1 �� (t)
Pr = [P r(|φl − φ̂l | ≤ ζ)]
2T
t=1 l=1
�T � 2
1 (t)
= [sgn(ζ − |φl − φ̂l |)], (6.15)
2T
t=1 l=1

where P r(.) denotes the probability of an event, and sgn(x) is defined as



 1 if x ≥ 0
sgn(x) = (6.16)
 0 if x < 0
The result is presented as probability of resolution plot in Figure 6.3. It is to be noted
that the SH-root-MUSIC performs better than the SH-MUSIC method. However, SH-MGD
performs better than both of these methods.

6.4 Summary and Contributions

In this chapter, a high resolution source localization method called SH-root-MUSIC is pro-
posed in the spherical harmonics domain. SH-root-MUSIC does not require any search for
6.4 Summary and Contributions 102

estimating the DOAs. It provides DOA estimates as direct roots of SH-MUSIC polynomial.
The Vandermonde structure of array manifold in spherical harmonics domain is shown using
manifold separation technique. The robustness of the method is illustrated using source local-
ization experiments at various SNRs. RMSE and probability of resolution measures indicate
the relevance of the proposed method.
Chapter 7

Near-field Source Localization over


Spherical Microphone Array

7.1 Introduction

There has been extensive work on far-field source localization using spherical microphone
array. The element space Multiple SIgnal Classification (MUSIC) [3] is implemented in terms
of spherical harmonics (SH), called SH-MUSIC, in [15, 16]. Estimation of Signal Parameters
via Rotational Invariance Techniques (ESPRIT) [21] algorithm is extended for spherical array
in [17, 86]. The minimum variance distortionless response (MVDR) [2] method in terms of
spherical harmonics, SH-MVDR, is utilized for DOA estimation in [16]. MUSIC-Group delay
[19, 25] has also been extended for spherical array in [14]. All these source localization
methods deals with planar wavefront of far-field sources. However, in applications like Close
Talk Microphone (CTM) and video conferencing, assumption of planar wavefront is no more
valid.
Principles of near-field source localization using spherical microphone array was first de-
tailed in [107]. A spatially orthonormal decomposition of the sound field due to a near-
field source, was used. The work proposed a close-talk spherical microphone array which
is orientation-invariant with respect to attenuation of far-field interferences. A method to
estimate the distance of the array to a near-field source using the ratio of mode energies
7.2 Formulation of Near-field Array Data Model in Spherical Harmonics
Domain 104

of a spherical orthonormal expansion of the sound field was described [107]. The near-field
criteria for spherical array was formally formulated in [24] in terms of range of the near-field
sources. However, work related to simultaneous estimation of range and bearing of multiple
near-field sources using spherical microphone array is hitherto not been investigated. Our
work on near-field source localization [13], provides an insight on this. A detailed analysis is
required in this context.
In this chapter, a new data model is formulated for near-field source localization in spher-
ical harmonics domain. Various methods for simultaneous estimation of the range and the
bearing of near-field sources are proposed. Near-field beamforming weights are computed for
radial filtering analysis. Cramér-Rao bound is formulated for evaluating the estimators.

7.2 Formulation of Near-field Array Data Model in Spherical


Harmonics Domain

In this Section, the formulation of near-field data model in spherical harmonics domain is
described. The formulation starts with near-field spatio-temporal data model. This is used
herein to derive spatio-frequency and subsequently the spherical harmonics data model for
near-field sources.

7.2.1 Near-field Data model in Spatial Domain

A spherical microphone array of order N , radius ra and number of sensors I is considered. The
order of spherical microphone array is defined in the Section 2.5.1. A sound field of spherical-
waves with wavenumber k from L near-field sources is incident on the array. The lth source
location is denoted by rl = (rl , Ψl ), where Ψl = (θl , φl ). The elevation angle θ is measured
down from positive z axis, while the azimuthal angle φ is measured counterclockwise from
positive x axis. Similarly, the ith sensor location is given by ri = (ra , Φi ), where Φi = (θi , φi ).
Theory of spherical wave propagation as described in Section 2.4.2, suggests that the pressure
at the ith microphone due to lth near-field source sl (t) can be expressed as

sl (t − τi (Ψl ))
pil (t) = (7.1)
|ri − rl |
7.2 Formulation of Near-field Array Data Model in Spherical Harmonics
Domain 105

Far-field

Near-field

Figure 7.1: Illustration of Near-field and far-field regions around spherical microphone array.
The ith microphone is positioned at ri and lth source at rl .

The time delay τi (Ψl ) can be calculated from Figure 7.1 as

|ri − rl |
τi (Ψl ) = (7.2)
c

with c being the speed of sound. The total pressure at the ith microphone in presence of
noise can be written as
L
� sl (t − τi (Ψl ))
pi (t) = + vi (t). (7.3)
|ri − rl |
l=1

where vi (t) is the noise at ith microphone and t = 1, 2, · · · , Ns , with Ns being the snapshots.
Computing the discrete Fourier transform (DFT), the Equation 7.3 turns out to be
L
� e−j2πfν τi (Ψl )
Pi (fν ) = Sl (fν ) + Vi (fν ), ν = 1, · · · , Ns . (7.4)
|ri − rl |
l=1

where j is the unit imaginary number. Utilizing the Equation 7.2, and dropping ν for nota-
tional simplicity, the Equation 7.4 can be re-written in wavenumber (and hence frequency)
domain as
L
� e−jk|ri −rl |
Pi (k) = Sl (k) + Vi (k). (7.5)
|ri − rl |
l=1
7.2 Formulation of Near-field Array Data Model in Spherical Harmonics
Domain 106

Rearranging Equation 7.5 in matrix form, the near-field data model in spatial domain can
finally be written as
P(k) = A(r, Ψ)S(k) + V(k) (7.6)

where A(r, Ψ) is I × L near-field steering matrix, S is L × Ns signal matrix and V is I × Ns


matrix of uncorrelated sensor noise. The noise components are assumed to be white, circularly
Gaussian distributed with zero mean and covariance matrix σ 2 I. I is an identity matrix.
Dependency of A on k is removed for notational simplicity. The steering matrix A(r, Ψ) is
� �
A(r, Ψ) = a(r1 , Ψ1 ) a(r2 , Ψ2 ) . . . a(rL , ΨL ) , where (7.7)
� �T
−jk|r1 −rl | e−jk|r2 −rl | e−jk|rI −rl |
a(rl , Ψl ) = e |r −r . . . . (7.8)
1 l| |r2 −rl | |rI −rl |

7.2.2 Near-field Data model in Spherical Harmonics Domain

To utilize the advantage of spherical harmonics signal processing, the spatial domain data
model in Equation 7.6 is converted to data model in spherical harmonics domain. The ith
term in Equation 7.8 refers to pressure at location ri due to lth unit amplitude source. This
can be alternatively expanded in terms of spherical harmonics as (Section 2.5.2),
N n
e−jk|ri −rl | � �
= bn (k, ra , rl )Ynm (Ψl )∗ Ynm (Φi ) (7.9)
|ri − rl | m=−n
n=0

where bn (k, ra , rl ) is nth order near-field mode strength. It is related to far-field mode strength
bn (k, ra ) as [24]

bn (k, ra , rl ) = j −(n−1) kbn (k, ra )hn (krl ) (7.10)

The far-field mode strength for open sphere (virtual sphere) and rigid sphere is given by

bn (k, r) = 4πj n jn (kr), open sphere (7.11)


� j � (kra ) �
= 4πj n jn (kr) − n� hn (kr) , rigid sphere. (7.12)
hn (kra )

Here jn is spherical Bessel function of first kind, hn is spherical Hankel function of first kind
and � refers to first derivative. As discussed in Section 2.5.3, for signal processing in spherical
7.2 Formulation of Near-field Array Data Model in Spherical Harmonics
Domain 107

harmonics domain, mode strength is the deciding criteria for near-field extent rather than
the usual Fraunhofer distances. This leads to near-field range as
kmax
ra ≤ rl ≤ ra (7.13)
k
N
where, kmax = (7.14)
ra
Ynm in Equation 7.9 represents spherical harmonic of order n and degree m. The spherical
harmonics can be written from Section 2.5 as

(2n + 1)(n − m)! m
Ynm (Φ) = Pn (cosθ)ejmφ (7.15)
4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n

with Pnm being the associated Legendre functions, and (.)∗ denotes the complex conjugate.
Substituting the expression from Equation 7.9 in Equation 7.8, the steering matrix in
Equation 7.7 can be written as
� �
A(r, Ψ) = Y(Φ) B(k, ra , r1 )yH (Ψ1 ) B(k, ra , r2 )yH (Ψ2 ) · · · B(k, ra , rL )yH (ΨL )
(7.16)
where Y(Φ) is I × (N + 1)2 matrix. A particular ith row vector can be written as
� �
y(Φi ) = Y00 (Φi ) Y1−1 (Φi ) Y10 (Φi ) Y11 (Φi ) . . . YNN (Φi ) . (7.17)

y(Ψl ) is 1 × (N + 1)2 matrix with similar structure as in Equation 7.17, replacing Φi with
Ψl . The (N + 1)2 × (N + 1)2 matrix B(rl ) is given by
� �
B(k, ra , rl ) = diag b0 (k, ra , rl ), b1 (k, ra , rl ), b1 (k, ra , rl ), b1 (k, ra , rl ), . . . , bN (k, ra , rl ) (7.18)

For pressure sampled at I microphones of the spherical array, the spherical Fourier transform
from Section 5.2.1, can be written as

Pnm (k) = YH (Φ)ΓP(k) (7.19)


� �T
where Pnm = P00 P1(−1) P10 P11 · · · PN N and Γ = diag(a1 , a2 , · · · , aI ) is a di-
agonal matrix with elements ai being the sampling weights [88]. Additionally, following
orthogonality property of spherical harmonics holds,

YH (Φ)ΓY(Φ) = I (7.20)
7.3 Near-field Source Localization in Spherical Harmonics Domain 108

where I is (N + 1)2 × (N + 1)2 identity matrix.


Substituting Equation 7.16 in 7.6, multiplying both sides by YH (Φ)Γ and utilizing Equa-
tions 7.19 and 7.20, the final spherical harmonics data model for near-field source localization
becomes
� �
Pnm (k) = B(r1 )yH (Ψ1 ) · · · B(rL )yH (ΨL ) S(k) + Vnm (k). (7.21)

Dependency of B(rl ) on k and ra is dropped for notational simplicity. The near-field data
model can be written in more compact way as

Pnm (k) = Anm (r, Ψ)S(k) + Vnm (k) (7.22)

where Anm (r, Ψ) will be called near-field steering matrix in spherical harmonics domain.
The new data model is very similar to spatial domain data model in 7.6. However, the new
steering matrix is given by
� �
Anm (r, Ψ) = B(r1 )yH (Ψ1 ), · · · , B(rL )yH (ΨL ) . (7.23)

The data model in Equation 7.22 is utilized in the ensuing Section for near-field source
localization.

7.3 Near-field Source Localization in Spherical Harmonics Do-


main

Following the development of the near-field data model, various methods for joint range and
bearing estimation are proposed in this Section. MUSIC, MUSIC-Group delay and MVDR
are formulated in spherical harmonics domain. From the expression of near-field steering
matrix in 7.23, a near-field steering vector in spherical harmonics domain can be written as

anm (r, Ψ) = B(r)yH (Ψ). (7.24)

The search has to be performed over r as in Equation 7.13 and over Ψ with (0 ≤ θ ≤ π, 0 ≤
φ ≤ 2π).
7.3 Near-field Source Localization in Spherical Harmonics Domain 109

7.3.1 Spherical Harmonics MUSIC for Near-field Source Localization

The MUSIC magnitude spectrum for near-field source localization in spherical harmonics
domain, can now be defined as

1
PSH−M U SIC (r, Ψ) = (7.25)
aH H
nm (r, Ψ)Qnm Qnm anm (r, Ψ)

where anm (r, Ψ) is near-field steering vector in SH domain, defined in Equation 7.24 and
Qnm is noise subspace obtained from eigenvalue decomposition of modal covariance matrix,
RPnm , defined as
RPnm = E[Pnm (k)PH
nm (k)]. (7.26)

The denominator of the MUSIC spectrum tends to zero when (r, Ψ) corresponds to source
location owing to orthogonality between noise eigenvector and steering vector. Hence, a peak
is obtained in SH-MUSIC spectrum at location of the source.

7.3.2 Spherical Harmonics MUSIC-Group Delay Method for Near-field


Source Localization

The SH-MUSIC utilizes the magnitude of aH


nm Qnm as it is clear from Equation 7.25. As

discussed in Chapter 4, a sharp change in unwrapped phase is seen at the locations of the
sources. Hence, the negative differentiation of unwrapped phase spectrum (group delay)
results in peak at the source locations. However, the group delay spectrum sometimes may
have spurious peaks due to microphone calibration errors. The product of MUSIC and
Group delay spectra, called MUSIC-Group delay, removes such spurious peaks and gives
high resolution location estimation. The Spherical Harmonics MUSIC-Group delay (SH-
MGD) spectrum for near-field source localization is formulated as

��
U �
PSH−M GD (r, Ψ) = |∇arg(aH
nm (r, Ψ)q u )| 2
PSH−M U SIC (7.27)
u=1

where U = (N + 1)2 − L, ∇ is the gradient operator, arg(.) indicates unwrapped phase, and
qu represents the uth eigenvector of the noise subspace, Qnm . The first term within (.) is the
group delay spectrum. The gradient is taken with respect to (r, Ψ).
7.3 Near-field Source Localization in Spherical Harmonics Domain 110

x : 55
Y : 40
1 x : 55 1 z:1
Y : 0.08 x : 60 x : 60
z:1 Y : 0.06 Y : 30
SH−MUSIC

SH−MUSIC
z : 0.66 z : 0.71

0.5 0.5

0 0
0.1 80
0.08 80 60 80
60 40 60
Range(m) 0.06 20
40 °
Azimuth(φ ) 20 20
40
0.04 0 °
Elevation(θ ) 0 0 Elevation(φ°)

(a) (d)

x : 55
Y : 40
1 x : 55 1 z:1 x : 60
Y : 0.08 x : 60 Y : 30
z:1 Y : 0.06 z : 0.8
z : 0.71
SH−MGD

SH−MGD
0.5 0.5

0 0
0.1 80
0.08 80 60 80
60 40 60
0.06 40 Azimuth(φ°) 20 40
Range(m) 20 20
0.04 0 °
Elevation(θ ) 0 0 Elevation(θ°)

(b) (e)

x : 55
Y : 40
1 x : 55 1 z:1
Y : 0.08
x : 60 x : 60
z:1
SH−MVDR

Y : 30
SH−MVDR

Y : 0.06
z : 0.96 z : 0.96
0.5 0.5

0 0
0.1 80
0.08 80 60 80
60 40 60
0.06 40 20 40
Range(m) 20 Elevation(θ°) Azimuth(φ) 20 °
Elevation(θ )
0.04 0 0 0

(c) (f)

Figure 7.2: Illustration of range and elevation estimation by (a) SH-MUSIC method (b) SH-
MGD method (c) SH-MVDR method for fixed azimuth. Illustration of elevation and azimuth
estimation using (d) SH-MUSIC method (e) SH-MGD method (f) SH-MVDR method for
fixed range. The sources are at (0.06m,60◦ ,30◦ ) and (0.08m,55◦ ,40◦ ) at an SNR of 10dB.

7.3.3 Spherical Harmonics MVDR Method for Near-field Source Localiza-


tion

The conventional MVDR minimizes the contribution of interference impinging on the array
from a direction other than the desired DOAs, while it maintains unity gain in the look direc-
tion. Under such conditions, the SH-MVDR power spectrum for near-field source localization
7.4 The Near-field MVDR Beampattern Analysis 111

can be written as
1
PSH−M V DR (r, Ψ) = −1 . (7.28)
aH
nm (r, Ψ)RPnm anm (r, Ψ)

Figures 7.2 illustrates the performance of SH-MUSIC, SH-MGD and SH-MVDR for range
and bearing estimation using spherical microphone array. The simulation was done consid-
ering rigid sphere with two closely spaced sources at (0.06m,60◦ ,30◦ ), (0.08m,55◦ ,40◦ ) and
SNR 10dB. Figures 7.2(a), 7.2(b) and 7.2(c) show plots corresponding to range and elevation
estimation with known azimuth. Plots in Figure 7.2(d), 7.2(e) and 7.2(f) show azimuth and
elevation of the sources at the given range. It can be noted that SH-MUSIC and SH-MGD
being subspace-based methods have higher resolution than SH-MVDR. The high resolution
of SH-MGD is due to additive property of group delay spectrum. A mathematical proof
for additive property of spatial domain MUSIC-Group delay spectrum is provided in Section
4.2.3. All the spatial domain results are also valid in spherical harmonics domain [50]. Hence,
the additive property of group delay spectrum also holds in the spherical harmonics domain.
Having developed data model and localized sources, near-field beampattern is presented in
the ensuing Section. The beampattern is presented for near-field MVDR spatial filter, that
preserves unity gain in look direction while minimizing the output power. The estimated
location can be used for steering the array in look direction for spatial filtering.

7.4 The Near-field MVDR Beampattern Analysis

Various radial compensation filters have been utilized in [50, 24], for the design of near-
field beampatterns. However, all these radial filters are designed assuming rotationally sym-
metric around the look direction. The weight vector in this context, is given as Wnm =
[W00 , W1(−1) , W10 , W11 , · · · , WN N ]T , where

dn
Wnm = Y m∗ (Ψs ). (7.29)
bn (k, r, rs ) n

Wnm (k) is spherical Fourier transform of W (k). The dependency on k is dropped for sim-
plicity. Here dn is the design parameter for controlling the beampattern, Ψs is look direction
and rs is look distance. A discussion on some optimal beamforming techniques is also given
in [50, Chapter 11]. However, the optimal beamforming techniques are limited to far-field
7.5 Cramér-Rao Bound Analysis 112

sources only. Here we present optimal near-field beamforming and in particular, MVDR, a
widely used one. As seen for the case of localization, beamforming techniques can also be
re-formulated in spherical harmonics domain similar to spatial domain. The formulation is
similar in spatial and spherical harmonics domain [50]. The MVDR problem formulation in
spherical harmonics domain is given as

min Wnm H RPnm Wnm subject to Wnm H anm = 1 (7.30)


Wnm

where anm is the steering vector. The solution to the given optimization problem in Equation
7.30 is given as,
R−1
Pnm anm
Wnm = −1 (7.31)
aH
nm RPnm anm

Utilizing Equation 7.24, the MVDR weights for steering to source at location, rs = (rs , Ψs )
is given by
R−1 H
Pnm B(rs )y (Ψs )
Wnm = . (7.32)
y(Ψs )BH (rs )R−1 H
Pnm B(rs )y (Ψs )

MVDR beampattern can now be computed utilizing


� �
� �
G = �Wnm H B(rl )yH (Ψl )� (7.33)

where rl is varied as in Equation 7.13 and Ψl takes values as (0 ≤ θl ≤ π, 0 ≤ φl ≤ 2π).

7.5 Cramér-Rao Bound Analysis

Although spherical microphone array is extensively used for source localization [12, 13, 14,
15, 16, 17, 86], CRB analysis in spherical harmonics domain is investigated sparsely. CRB
expression for far-field data model in spherical harmonics domain can be found in Section
5.6. The far-field data model for spherical microphone array, as derived in Chapter 5 is given
as
Dnm (Ψ; k) = YH (Ψ)S(k) + Znm (k) (7.34)

with YH (Ψ) being the far-field steering vector. Comparing the far-field data model in Equa-
tion 7.34 with near-field data model Equation 7.22, the expression of Fisher information
matrix for near-field observation model, can be obtained as in Section 5.6.2, by replacing
7.5 Cramér-Rao Bound Analysis 113

YH (Ψ) with Anm and RD with RP . Hence, the Fisher information matrix elements can be
written as
� −1 −1
Frθ = 2Re (RS AH T H
nm RP Anm RS ) � (Ȧnmr RP Ȧnmθ )

−1 −1

+ (RS AH T H
nm RP Ȧnmr ) � (RS Anm RP Ȧnmθ ) (7.35)
� −1 −1
Fθφ = 2Re (RS AH T H
nm RP Anm RS ) � (Ȧnm RP Ȧnmφ )
θ
−1 −1

+ (RS AH T H
nm RP Ȧnmθ ) � (RS Anm RP Ȧnmφ ) (7.36)

where RP = E[Pnm (k)PH H


nm (k)], RS = E[S(k)S (k)], � represents Hadamard product and

(˙) denotes derivative. The unknown parameter vector is

α = [r T θ T φT ]T (7.37)

with r = [r1 · · · rL ]T θ = [θ1 · · · θL ]T and φ = [φ1 · · · φL ]T . It is to be noted from Equation 7.23


that the parameters (rl , θl , φl ) are present only in lth column of Anm . Hence, the definition
of vector and scalar differentiation used herein is
L

Ȧnmr = Ȧnmrl (7.38)
l=1
∂Anm
and, Ȧnmrl = . (7.39)
∂rl

The differentiation w.r.t. other parameters can be written in the similar way. The derivative
of Anm involves differentiation of spherical Hankel function and spherical harmonics. The
derivative of the near-field steering matrix Anm is computed by utilizing the partial derivative
of near-field mode strength and spherical harmonics function. The detailed computation of
the derivative of near-field steering matrix is given in Appendix B.
The other blocks of FIM can be written in similar way. The final FIM is given as
 
F Frθ Frφ
 rr 
 
F =  Fθr Fθθ Fθφ  .
 
Fφr Fφθ Fφφ

The bound is obtained from the inverse of fisher information matrix. Cramér-Rao bound is
plotted in Figure 7.3 for various SNRs. The CRB plots for random signal and for sinusoidal
signal are illustrated in Figure 7.3(a) and 7.3(b) respectively. It may be noted that the CRB
7.6 Performance Evaluation 114

−6
x 10
2
CRB(r)
CRB(θ)
1.5 CRB(φ)
CRB

0.5

0
−10 −7.5 −5 −2.5 0 2.5 5 7.5 10
SNR (dB)

(a)

−4
x 10
CRB(r)
CRB(θ)
CRB(φ)
CRB

0
−10 −7.5 −5 −2.5 0 2.5 5 7.5 10
SNR(dB)

(b)

Figure 7.3: Cramér-Rao bound analysis at various SNR, (a) for random signal (b) for sinu-
soidal signal. The source location is (0.08m, 40◦ , 50◦ ).

for random signal is lower than CRB for sinusoidal signal. Also, a lower bound on CRB is
attained at higher SNR.

7.6 Performance Evaluation

Experiments on source localization are conducted to evaluate the proposed methods. Sig-
nificance of proposed methods in near-field beamforming, is also presented by conducting
an experiment on suppression of undesired sources in near-field. Simulation experiments on
7.6 Performance Evaluation 115

source localization are conducted to evaluate the performance. Additionally, experiments


are also performed on real signal acquired over spherical microphone array. The experiment
utilizes an Eigenmike system shown in Figure 7.5. It consists of 32 microphones, embed-
ded in rigid sphere of radius 4.2cm. The order of the microphone array is taken to be 4.
Four dimensional scatter plots, root-mean square error (RMSE) and probability of resolution
measures are used to evaluate the source localization performance of the proposed methods.

7.6.1 Experiments on Near-field Source Localization

In this Section, experiments on near-field source localization is conducted. In particular, the


ability of the proposed methods to radially discriminate aligned sources is analyzed. Two
narrowband sources with location r1 = (0.1, 30◦ , 45◦ ) and r2 = (0.8, 30◦ , 45◦ ) are taken for
the analysis. The frequency of the sources are taken to be 220Hz and 250Hz respectively.
It is to be noted that both the sources have same DOA. However, they are well separated
radially. The DOA of the sources is assumed to be known, and range is estimated at various
SNRs.

7.6.1.1 RMSE Analysis of Range Estimation

The relative performance of various proposed methods is presented herein using cumulative
RMSE. The cumulative RMSE is computed as
T 2
1 �� (t)
RM SE = [(rl − r̂l )2 ], (7.40)
2T
t=1 l=1

where t indicates trial number, T is total trials and l denotes the source number. The RMSE
results are presented in Table 7.1, for 100 independent trials. It can be concluded that for
resolving sources in same direction with different radial distance, a high SNR is required.
At low SNRs, SH-MUSIC and SH-MVDR methods are unable to resolve the sources. The
SH-MGD method performs reasonably well even at low SNRs. At high SNRs, all the methods
perform equally well, giving low RMSE.
7.6 Performance Evaluation 116

Table 7.1: Cumulative RMSE in range r, at various SNRs for 100 iterations. Sources are at
(0.1m, 30◦ , 45◦ ) and (0.8m, 30◦ , 45◦ ).
SNR SNR SNR SNR
Methods
(10dB) (20dB) (30dB) (40dB)
SH-MGD 0.0847 0.0785 0.0389 0.0217
SH-MUSIC 0.495 0.495 0.2891 0.0049
SH-MVDR 0.495 0.495 0.495 0.0562

7.6.1.2 Statistical Analysis of Range Estimation

Statistical analysis of range estimation is presented herein, using probability of resolution.


The experimental conditions are similar to one used in the previous Section. The probability
of resolution for range is defined as
T 2
1 �� (t)
Pr = [P r(|rl − r̂l | ≤ ζ)]
2T
t=1 l=1
�T � 2
1 (t)
= [sgn(ζ − |rl − r̂l |)], (7.41)
2T
t=1 l=1

where P r(.) denotes the probability of an event, and sgn(x) is signum function defined in
Equation 5.55. The confidence interval is taken as ζ = 0.08m. The relative performance of
the proposed methods is presented using bar plot in Figure 7.4.

1
SH−MGD
Probability of Resolution

0.8 SH−MUSIC
SH−MVDR

0.6

0.4

0.2

0
10 20 30 40
SNR(dB)

Figure 7.4: Range estimation performance of SH-MGD, SH-MUSIC and SH-MVDR in terms
of probability of resolution.
7.6 Performance Evaluation 117

It is to be noted that the probability of resolution for SH-MGD is high even at low SNR
when compared to SH-MUSIC and SH-MVDR.

7.6.2 Experiments on Joint Range and Bearing Estimation

In most of the applications involving near-field communication, one of the parameters (range,
azimuth or elevation) can be assumed to be constant. However, in this section, experiments
on joint range and bearing (azimuth and elevation) estimation are given. The experiments are
performed for both simulated and actual signals acquired from a spherical microphone array.
Eigenmike system is utilized in an anechoic chamber for acquiring the signals. Experimental
results are illustrated using four-dimensional scatter plots.

7.6.2.1 Experimental Setup

An experimental set-up for near-field source localization using Eigenmike system is shown
in Figure 7.5. For the real experiments, signal is recorded in an anechoic chamber using
Eigenmike system. A smartphone speaker is utilized as an acoustic source. The source is
fixed at location (0.3m, 90◦ , 90◦ ). A narrowband signal with frequency of 600Hz is used.

Figure 7.5: The Eigenmike setup in an anechoic chamber at IIT Kanpur for acquiring near-
field sources. A near-field source is placed at (0.3m, 90◦ , 90◦ ).
7.6 Performance Evaluation 118

7.6.2.2 Experimental Results

Experimental results are presented using four-dimensional scatter plots for both, simulation
and signals acquired from spherical microphone array. The SH-MUSIC and SH-MGD spatial
spectrum as proposed in Equations 7.25, 7.27, are utilized for simultaneous estimation of
range and bearing of a source. The near-field source localization scatter plots are shown
for SH-MUSIC and SH-MGD in Figure 7.6. The magnitude of SH-MUSIC and SH-MGD
spectrum is represented by a color bar.
1
1
0.935
0.45 0.92
Competing peak 0.87
0.34
(0.31,90°,90°)
0.84
0.4 0.805
Range(r)

0.76
0.32
Range(r)

0.74
0.35 0.681
0.675
0.3 0.601
0.3 0.611 Desired peak
(0.3,90°,90°) 0.521
0.25 0.546
92 0.441
Desired peak 92
0.481
91 (0.3,90°,90°) 91 0.361
90 92 0.416 92
91 90 91 0.281
89 90 0.351 90
89 89
88 88 89 0.202
88 88
Azimuth(φ) Elevation(θ) Azimuth(φ) Elevation(θ)

(a) (b)
1 1
0.98 0.975
0.4 0.375
0.96 0.95
0.39 0.94 0.925
Range(r)

0.37
Range(r)

0.92 0.901
0.38
0.901
0.365 0.876
0.37 0.881 0.851
0.36 0.861
0.36 0.826
105 105
0.841 0.801
100 100
105 0.821 95 100
95 100 95 0.776
90 95 0.801 90 90
90 85 85 0.751
85 85
Azimuth(φ) Elevation(θ) Azimuth(φ) Elevation(θ)

(c) (d)

Figure 7.6: Four dimensional scatter plots using, (a) SH-MUSIC for simulated signal, (b)
SH-MGD for simulated signal, (c) SH-MUSIC for signal acquired over SMA (d) SH-MGD for
acquired over SMA. A narrowband source with frequency 600Hz, located at (0.3m, 90◦ , 90◦ )
is considered.

Figure 7.6(a) and Figure 7.6(b) corresponds to source localization results for simulated
7.6 Performance Evaluation 119

signal. The candidate location corresponding to the highest magnitude of SH-MUSIC and
SH-MGD, is represented by a square in both the figures. However, in SH-MUSIC spectrum
(Figure 7.6(a)), an additional competing peak can be seen, represented by brown circle. On
the other hand, SH-MGD spectrum in Figure 7.6(b) shows a single candidate peak. Both the
methods are able to estimate the source.
Experimental results corresponding to signal acquired over spherical microphone array
in an anechoic chamber, is shown in Figure 7.6(c) and Figure 7.6(d) for SH-MUSIC and
SH-MGD respectively. It can be noted that SH-MUSIC spectrum has many spurious peaks,
which are greatly reduced in SH-MGD spectrum. In SH-MGD spectrum, peaks can be
observed clearly for elevation and azimuth varying from 85◦ to 90◦ at range 0.36m, which
is close to the location of the source, (0.3m, 90◦ , 90◦ ). It is to be noted that the errors in
source localization are due to reflection of sound from the tripods, non-point sound source
and microphone-source physical placement error.

7.6.3 Experiments on Interference Suppression using Near-field MVDR


Beamforming

Two near-field sources are considered at (0.1m, 50◦ , 30◦ ), and (0.3m, 55◦ , 40◦ ). The source
close to the array is assumed to be the desired source while the other is the interference. The
suppression of the interfering source is illustrated using near-field MVDR beampattern as in
Equation 7.33. The MVDR beampattern in this context, is plotted in Figure 7.7. The plot

Figure 7.7: Illustration of near-field MVDR beampattern. The desired source is at


(0.1m, 50◦ , 30◦ ), and interfering source at (0.3m, 55◦ , 40◦ ).
7.6 Performance Evaluation 120

is illustrated for azimuth and range with known elevations. As expected, the array gain is
close to 0 dB (undistorted) for the desired source, while the interfering source suffers very
high attenuation.
From the given beampattern, an important observation and a possible use case can be
made out. Figure 7.7 is plotted again in Figure 7.8, with fixed range and varying azimuth.
Additional 2-D figures are also given with varying range and fixed azimuth for the purpose
of clarity.

30 −10

20 −20
Array Gain (dB)
Array Gain(dB)

x : 40
10 Y : 14 −30
x:30
Y:−41
0 x : 30 −40
Y : 0.4
x:40
−10 −50 Y:−55

−20 −60
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Azimuth Azimuth

(a) (b)
60
40
40
20 x : 0.1
Array Gain(dB)

x : 0.1
Array Gain(dB)

Y : 1.0 Y : 15
0 20

−20 0
x : 0.3
Y : −41
−40 −20

−60 −40 x : 0.3


Y : −55
−80 −60
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Range(m) Range(m)

(c) (d)

Figure 7.8: Radial filtering analysis of the proposed near-field MVDR method over a spherical
microphone array. (a) Array gain for fixed r = 0.1m. (b) Array gain for fixed r = 0.3m. (c)
Array gain for fixed θ = 30◦ . (d) Array gain for fixed θ = 40◦ .

It is to be noted that for steering distance set at the desired source (r = 0.1m), for all
7.7 Summary and Contributions 121

azimuth the array gain is close to 0dB (Figure 7.8(a)). The gain variation within a small range
of azimuth is minimal. On the other hand, the gain deteriorates significantly at undesired
radial distance as shown in Figure 7.8(b). It is noted that this holds for all look directions.
Figure 7.8(c) and 7.8(d) indicate gain variation with source distance at fixed azimuth. In
order to study the radial filtering ability of proposed method, desired source azimuth angle of
30◦ and radial distance of 0.1m is considered. It may be noted from 7.8(c) that the array gain
is close to 0dB at the desired radial distance of 0.1m. The array gain decreases significantly
at undesired radial distance of 0.3m as can be noted from Figure 7.8(c). Subsequently, the
array gain is observed for undesired source azimuthal angle of 40◦ in Figure 7.8(d). The array
gain at the desired radial distance is still close to 0dB. On the other hand, the array gain
significantly decreases at undesired radial distance of 0.3m.
This implies that for the microphone array steered in the near-field, the MVDR beam-
pattern is robust to minor variation of angle. However, it is very sensitive to the distance.
This was observed in all simulations. Hence, it can be concluded that the proposed near-field
MVDR spatial filter is very sensitive to radial parameter. The MVDR spatial filter is however
robust to minor azimuth angle variations. This can be utilized in near-field communication
applications where minor variation in azimuth angle for the desired source is expected. One
such application can be visualized when a microphone array is integrated in a cellphone. A
speaker in near-field of such a microphone array can make changes in his azimuth angle while
interfering sources are generally not present at the desired radial distance.

7.7 Summary and Contributions

In this chapter, a data model for near-field source localization in spherical harmonics domain
is proposed. Three methods namely SH-MUSIC, SH-MGD and SH-MVDR are formulated
for localization of the near-field sources. The methods are verified using simulation and signal
acquired over spherical microphone array in an anechoic chamber. Formulation and analysis
of Cramér-Rao bound for near-field sources in spherical harmonics domain is also presented.
Experiments are conducted for radially separated aligned sources. The proposed methods are
7.7 Summary and Contributions 122

evaluated using RMSE and statistical analysis. The significance and practical application of
the proposed methods is discussed using experiment on interference suppression. In this con-
text, the near-field MVDR beampattern analysis promises robust near-field communication
which can be investigated as future work.
Chapter 8

Conclusions and Future Directions

8.1 Conclusions

This thesis addresses the Source localization problem in spatial and spherical harmonics do-
main. The spatio-temporal array data model is presented starting from first principles in
physics. Subsequently the spatio-frequency and spherical harmonics data model is also dis-
cussed. A new spherical harmonics data model is developed for near-field source localization
using spherical microphone array. Novel methods for acoustic source localization are proposed
in spatial and spherical harmonics domain.
In spatial domain, a novel high resolution source localization method based on the MUSIC-
Group delay spectrum is proposed. The method provides robust azimuth and elevation
estimates of closely spaced sources as indicated by source localization experiments when
compared to conventional source localization methods. The additive property of the group
delay function in spatial domain is proved mathematically to explain the resolving power
of the proposed method. The significance of the MUSIC-Group delay method in speech
enhancement and distant speech recognition is illustrated using improvements in signal to
interference ratios and lower word error rates.
Signal processing in spherical harmonics domain provides ease of beamforming and ef-
ficient array processing due to reduced dimensionality. Both far-field and near-field source
localization problems are addressed in the spherical harmonics domain. The MUSIC-Group
delay method is formulated in spherical harmonics domain (called SH-MGD) for far-field
8.2 Future Directions 124

source localization. The high resolution capability of SH-MGD, make it more relevant when
compared to conventional methods like SH-MUSIC and SH-MVDR. An additional source
localization algorithm called, SH-root-MUSIC is also presented for azimuth only estimation
of far-field sources. It retains all the inherent advantages of root-MUSIC including lower
computational complexity. The Vandermonde structure of array manifold is also illustrated
using manifold separation technique.
Near-field source localization over a spherical microphone array is also addressed in this
thesis. A new data model for near-field source localization using spherical microphone array
is developed. In particular, three methods namely SH-MUSIC, SH-MGD and SH-MVDR,
that jointly estimate the range and bearing of multiple sources are proposed. Results of
near-field MVDR beampattern analysis promises robust near-field communication applica-
tion. Additionally, stochastic Cramér-Rao bound for far-field and near-field data model is
formulated in spherical harmonics domain to evaluate the location estimators. The ability of
the proposed methods to radially discriminate aligned sources is also analyzed.

8.2 Future Directions

The near-field data model developed in this thesis, can be incorporated in a sparsity based
framework for source localization. As a part of future work, sparsity based methods for
near-field source localization can be explored. The near-field model can be transformed into
a sparse recovery problem where signal vector can be assumed as sparse. The problem can
be solved using an l1-regularized least-square method.
The near-field MVDR beampattern analysis result is encouraging. Using the near-field
data model developed, a near-field MVDR spatial filter is designed. The near-field MVDR
spatial filter designed in this thesis, exhibits high radial filtering efficiency. However, it is
robust to minor azimuth angle variations. This can be utilized in near-field communication
applications where minor variation in azimuth angle for the desired source is expected. One
such application can be visualized when a microphone array is integrated in cellphone. A
speaker in near-field of such a microphone array can make changes in his azimuth angle while
interfering sources are generally not present at the desired radial distance. Utilization of
8.2 Future Directions 125

near-field MVDR spatial filter in real life application needs further investigation.
The techniques developed in this thesis along with spherical near-field acoustic hologra-
phy (NAH), can be utilized for simultaneous source localization and separation. Near-field
acoustic holography techniques for source localization assume sources to be in near-field.
The NAH techniques are utilized for localization of various automotive noise like wind noise,
tire noise and accessory noise. The near-field localization and beamforming techniques pre-
sented in this thesis, can be investigated further for automotive noise source identification
and reduction of noise levels.
Appendices
Appendix A

Stochastic Cramér-Rao Bound in


Spherical Harmonics Domain

The array data model formulated in Chapter 5 will be utilized here for stochastic Cramér-Rao
bound (CRB) derivation. The data model and model covariance matrix is re-written from
Equations 5.28 and 5.47 :

[Dnm (k)](N +1)2 ×Ns = [YH (Ψ)](N +1)2 ×L [S(k)]L×Ns + [Znm (k)](N +1)2 ×Ns (A.1)

RD = E[Dnm (k)Dnm (k)H ] = YH (Ψ)RS (k)Y(Ψ) + σ 2 C (A.2)

YH (Ψ) is the steering matrix. A particular lth steering vector can be written as
∗ ∗ ∗ ∗ ∗
yH (Ψl ) = [Y00 (Ψl ), Y1−1 (Ψl ), Y10 (Ψl ), Y11 (Ψl ), . . . , YNN (Ψl )]T (A.3)

where spherical harmonics of order n and degree m can be written from Section 2.5 as

m (2n + 1)(n − m)! m
Yn (Φ) = Pn (cosθ)ejmφ (A.4)
4π(n + m)!

∀0 ≤ n ≤ N, −n ≤ m ≤ n.

A closed-form expression for stochastic CRB(DOA) is presented herein. Hence, the unknown
direction parameter vector taken here is

α = [θ T φT ]T (A.5)

where θ = [θ1 · · · θL ]T and φ = [φ1 · · · φL ]T .


A.1 Formulation of Fisher Information Matrix 128

A.1 Formulation of Fisher Information Matrix

Formulation of Fisher information matrix (FIM) is presented herein. The CRB is computed
from the inverse of the Fisher information matrix. The Fisher information matrix elements
given by Equation 5.52, can be further simplified to [55],

∂RD ∂RD
Frs = tr{RD −1 RD −1 }. (A.6)
∂αr ∂αs

It is to be noted from Equation A.1 that for Ns wavenumbers (FFT index) corresponding
to Ns snapshots [26], the Fisher Information Matrix elements will be Ns times of given in
Equation A.6. The parameters (θr , φr ) are present in rth column of YH (Ψ). Hence, following
notational definition for the vector derivative of steering matrix YH (Ψ) is used,
L

H
Ẏθ = ẎθHr (A.7)
r=1

∂Y H
with ẎθHr � ∂θr . The scalar derivative ẎθHr , can be extracted from the vector derivative in
Equation A.7 as
ẎθHr = Ẏθ
H
er eTr (A.8)

where er is the rth column vector of an identity matrix. These vector and scalar derivative
of steering matrix is used in ensuing formulation of CRB. Also, Y(Ψ) is replaced with Y for
equations to be more compact.
Utilizing Equation A.2, the partial derivative of covariance matrix RD with respect to
variable θr can be written as

∂RD
= ẎθHr RS Y + YH RS Ẏθr (A.9)
∂θr

Substituting this in Equation A.6 and making use of distributive property of matrix, the FIM
element can be expressed as


Fθr φs = tr RD −1 ẎθHr RS YRD −1 ẎφHs RS Y + RD −1 ẎθHr RS YRD −1 YH RS Ẏφs

+ RD −1 YH RS Ẏθr RD −1 ẎφHs RS Y + RD −1 YH RS Ẏθr RD −1 YH RS Ẏφs (A.10)

Utilizing tr(A + B) = tr(A) + tr(B), and rewriting the Equation A.10 in short form, the FIM
A.1 Formulation of Fisher Information Matrix 129

element is given by

Fθr ,φs = tr(z) + tr(xH ) + tr(w) + tr(y H ) (A.11)

where, x = (RD −1 ẎθHr RS YRD −1 YH RS Ẏφs )H

With suitable pairing and utilizing Hermitian positive semi-definiteness of covariance matrix,
x can be rewritten as

x = ẎφHs RS YRD −1 YH RS Ẏθr RD −1 (A.12)

tr(x) = tr(ẎφHs RS YRD −1 YH RS Ẏθr RD −1 ) (A.13)

Further, utilizing the cyclic property of trace, tr(AB) = tr(BA), we have

tr(x) = tr(RD −1 YH RS Ẏθr RD −1 ẎφHs RS Y) = tr(w).

Similarly, tr(y) = tr(z). (A.14)


� �∗
Noting the property of trace of a matrix, tr(xH ) = tr(x) , where ∗ denotes the complex
conjugate, and utilizing results of Equation A.14 in Equation A.11, the FIM elements can
now be written as

� �
Fθr ,φs = 2Re tr(x) + tr(y)
� �
= 2Re tr(ẎφHs RS YRD −1 YH RS Ẏθr RD −1 ) + tr(ẎφHs RS YRD −1 ẎθHr RS YRD −1 )

Utilizing the relation in Equations A.7 and A.8,


Fθr ,φs = 2Re tr(Ẏφ H
es eTs RS YRD −1 YH RS er eTr Ẏθ RD −1 )

H
+ tr(Ẏφ es eTs RS YRD −1 Ẏθ H
er eTr RS YRD −1 )

= 2Re eTs RS YRD −1 YH RS er eTr Ẏθ RD −1 Ẏφ H
es

+ eTs RS YRD −1 Ẏθ H
er eTr RS YRD −1 Ẏφ H
es

Hence the FIM can finally be written as

� �
Fθφ = 2Re (RS YRD −1 YH RS )T �(Ẏθ RD −1 Ẏφ
H
)+(RS YRD −1 Ẏθ ) �(RS YRD −1 Ẏφ
H T H
)
(A.15)
A.2 Computing the Derivative of Spherical Harmonics Function Ynm 130

where � denotes Hadamard product. The Hadamard product of two matrix are defined as

(X � Z)rs � (X)rs (Z)rs . (A.16)

Similar to Equation A.15, the other block of FIM with only one parameter vector, Fθθ can
be written as
� �
Fθθ = 2Re (RS YRD −1 YH RS )T � (Ẏθ RD −1 Ẏθ
H
) + (RS YRD −1 Ẏθ ) � (RS YRD −1 Ẏθ
H T H
) .
(A.17)

Fφφ and Fφθ can be expressed in the similar way.

A.2 Computing the Derivative of Spherical Harmonics Func-


tion Ynm

From Equations A.3 and A.4, the vector derivative Ẏφ can be found using

∂Ynm (Ψs )
= jmYnm (Ψs ). (A.18)
∂φs
Computing Ẏθ involves differentiation of the associated Legendre function. The derivative of
associated Legendre polynomial can be expressed using following recurrence relations [108],

(2n + 1)zPnm (z) = (n + m)Pn−1


m m
(z) + (n − m + 1)Pn+1 (z) (A.19)
∂Pnm (z) 1
= 2 [znPnm (z) − (m + n)Pn−1
m
(z)] (A.20)
∂z z −1
This leads to derivative of associated Legendre polynomial given by
∂Pnm (z) 1 m
= 2 [(n − m + 1)Pn+1 (z) − (n + 1)zPnm (z)]. (A.21)
∂z z −1
For z = cos θ, the derivative becomes
∂Pnm (cos θ) 1 � m

= (n − m + 1)Pn+1 (cos θ) − (n + 1) cos θPnm (cos θ) .
∂θ sin θ
Now, Ẏθ can be computed by utilizing following in Equation A.7 :

∂Ynm (Ψr ) (2n + 1)(n − m)! jmφr 1 m
= e [(n − m + 1)Pn+1 (cos θr ) − (n + 1) cos θr Pnm (cos θr )]
∂θr 4π(n + m)! sin θr
(A.22)
Appendix B

Computing the Derivative of


Near-field Steering Matrix

In this Appendix, we provide the necessary formulae for finding the derivative of the near-field
steering matrix. Near-field steering matrix can be written from Equation 7.23 as
� �
Anm (r, Ψ) = B(r1 )yH (Ψ1 ), · · · , B(rl )yH (Ψl ), · · · , B(rL )yH (ΨL ) . (B.1)

As the parameters (rl , θl , φl ) are present only in lth column of Anm , we need to get the
derivative of the lth column. The rest of columns will produce zero vectors. Hence, the
derivative of steering matrix w.r.t. range (Equation 7.39) turns out to be
∂Anm ∂B(rl ) H
Ȧnmrl = = [0, 0, · · · , y (Ψl ), · · · , 0, 0] (B.2)
∂rl ∂rl
From equation 7.18 and 7.10, it is clear that the above partial derivative involves differen-
tiation of spherical Hankel function. This can be found using following recurrence relations
[27],
2n + 1
hn (x) = hn−1 (x) + hn+1 (x) (B.3)
x
n+1
h�n (x) = hn−1 (x) − hn (x). (B.4)
x
The above recurrence relation lead to
n
h�n (x) = hn (x) − hn+1 (x) (B.5)
x
∂hn (krl ) n
or, = hn (krl ) − khn+1 (krl ) (B.6)
∂rl rl
132

Similarly, for derivative of steering matrix w.r.t. θ, the scalar differentiation of matrix
can be written as
∂yH (Ψl )
Ȧnmθl = [0, 0, · · · , B(rl ) , · · · , 0, 0] (B.7)
∂θl
Equations A.3 and A.4, reveal that the differentiation of near-field steering vector w.r.t.
θ needs derivative of associated Legendre function. The derivative of associated Legendre
polynomial is detailed in Appendix A.2.
Finally, the nonzero column in Ȧnmφl , can be written as

∂yH (Ψl )
Ȧnmφl = B(rl ) (B.8)
∂φl

Utilizing Equations A.3 and A.4, the above differentiation can be found using

∂Ynm (Ψl )
= jmYnm (Ψl ). (B.9)
∂φl

The partial derivatives given by Equations B.2, B.7 and B.8 can be utilized for computing
the derivative of near-field steering matrix.
References

[1] M. R. Bai, J. G. Ih, and J. Benesty, Acoustic Array Systems:Theory, Implementation,


and Application. Wiley-IEEE Press, 2013.

[2] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of


the IEEE, vol. 57, no. 8, pp. 1408 – 1418, aug. 1969.

[3] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,” in Proceed-
ings of RADC Spectrum Estimation Workshop, Griffiss AFB, NY, 1979, pp. 243–258.

[4] B. Yegnanarayana and Hema A. Murthy, “Significance of group delay functions in


spectrum estimation,” IEEE Trans. on Signal Processing, vol. 40, pp. 2281–2289, Sep.
1992.

[5] B. Yegnanarayana, “Formant extraction from linear-prediction phase spectra,” The


Journal of the Acoustical Society of America, vol. 63, no. 5, pp. 1638–1640, 1978.

[6] R. DuHamel, “Pattern synthesis for antenna arrays on circular, elliptical and spherical
surfaces,” Radio Direction Finding Section Elect. Eng. Res. Lab. Rep., Univ. of Illinois,
Urbana, 1952.

[7] M. Hoffman, “Conventions for the analysis of spherical arrays,” Antennas and Propa-
gation, IEEE Transactions on, vol. 11, no. 4, pp. 390–393, 1963.

[8] A. K. Chan, A. Ishimaru et al., “Equally spaced spherical array.” DTIC Document,
Tech. Rep., 1966.
REFERENCES 134

[9] B. Preetham Kumar and G. R. Branner, “The far-field of a spherical array of point
dipoles,” IEEE transactions on antennas and propagation, vol. 42, no. 4, pp. 473–477,
1994.

[10] J. Meyer and G. Elko, “A highly scalable spherical microphone array based on an
orthonormal decomposition of the soundfield,” in Acoustics, Speech, and Signal Pro-
cessing (ICASSP), 2002 IEEE International Conference on, vol. 2. IEEE, 2002, pp.
II–1781.

[11] T. D. Abhayapala and D. B. Ward, “Theory and design of high order sound field micro-
phones using spherical microphone array,” in Acoustics, Speech, and Signal Processing
(ICASSP), 2002 IEEE International Conference on, vol. 2. IEEE, 2002, pp. II–1949.

[12] Q. Huang and T. Wang, “Acoustic source localization in mixed field using spherical
microphone arrays,” EURASIP Journal on Advances in Signal Processing, vol. 2014,
no. 1, pp. 1–16, June 2014.

[13] L. Kumar, K. Singhal, and R. M. Hegde, “Near-field source localization using spher-
ical microphone array,” in Hands-free Speech Communication and Microphone Arrays
(HSCMA), 2014 4th Joint Workshop on, May 2014, pp. 82–86.

[14] ——, “Robust source localization and tracking using MUSIC-Group delay spectrum
over spherical arrays,” in Computational Advances in Multi-Sensor Adaptive Processing
(CAMSAP), 2013 IEEE 5th International Workshop on, St. Martin, France. IEEE,
2013, pp. 304–307.

[15] X. Li, S. Yan, X. Ma, and C. Hou, “Spherical harmonics MUSIC versus conventional
MUSIC,” Applied Acoustics, vol. 72, no. 9, pp. 646–652, 2011.

[16] D. Khaykin and B. Rafaely, “Acoustic analysis by spherical microphone array processing
of room impulse responses,” The Journal of the Acoustical Society of America, vol. 132,
p. 261, 2012.

[17] H. Sun, H. Teutsch, E. Mabande, and W. Kellermann, “Robust localization of multiple


sources in reverberant environments using EB-ESPRIT with spherical microphone ar-
REFERENCES 135

rays,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International
Conference on. IEEE, 2011, pp. 117–120.

[18] K. Ichige, K. Saito, and H. Arai, “High resolution doa estimation using unwrapped
phase information of music-based noise subspace,” IEICE Trans. Fundam. Electron.
Commun. Comput. Sci., vol. E91-A, pp. 1990–1999, August 2008.

[19] M. Shukla and R. M. Hegde, “Significance of the music-group delay spectrum in


speech acquisition from distant microphones,” in Acoustics Speech and Signal Process-
ing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010, pp. 2738–2741.

[20] A. Barabell, “Improving the resolution performance of eigenstructure-based direction-


finding algorithms,” in Acoustics, Speech, and Signal Processing, IEEE International
Conference on ICASSP’83., vol. 8. IEEE, 1983, pp. 336–339.

[21] R. Roy and T. Kailath, “ESPRIT-estimation of signal parameters via rotational in-
variance techniques,” Acoustics, Speech and Signal Processing, IEEE Transactions on,
vol. 37, no. 7, pp. 984–995, 1989.

[22] C. P. Mathews and M. D. Zoltowski, “Eigenstructure techniques for 2-d angle estimation
with uniform circular arrays,” Signal Processing, IEEE Transactions on, vol. 42, no. 9,
pp. 2395–2407, 1994.

[23] R. A. Kennedy, T. D. Abhayapala, and D. B. Ward, “Broadband nearfield beamforming


using a radial beampattern transformation,” Signal Processing, IEEE Transactions on,
vol. 46, no. 8, pp. 2147–2156, 1998.

[24] E. Fisher and B. Rafaely, “Near-field spherical microphone array processing with radial
filtering,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19,
no. 2, pp. 256–265, 2011.

[25] L. Kumar, A. Tripathy, and R. Hegde, “Robust multi-source localization over planar
arrays using music-group delay spectrum,” Signal Processing, IEEE Transactions on,
vol. 62, no. 17, pp. 4627–4636, Sept 2014.
REFERENCES 136

[26] L. Kumar and R. Hegde, “Stochastic cramér-rao bound analysis for doa estimation
in spherical harmonics domain,” Signal Processing Letters, IEEE, vol. 22, no. 8, pp.
1030–1034, Aug 2015.

[27] E. G. Williams, Fourier acoustics: sound radiation and nearfield acoustical holography.
academic press, 1999.

[28] C. Coskun, “Robust adaptive beamforming,” Ph.D. dissertation, Technical University


of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2008.

[29] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders, Fundamentals of acoustics.


John Wiley & Sons, Inc., 1999.

[30] J. Nahas, “Simulation of array-based sound field synthesis methods,” Audio Commu-
nication Group,TU Berlin, Diploma thesis, 2011. [Online]. Available: http://www2.ak.
tu-berlin.de/∼akgroup/ak pub/abschlussarbeiten/2011/NahasJohnny DiplA.pdf

[31] R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics,


Desktop Edition Volume I. Basic Books, 2013, vol. 1.

[32] P. T. D. Abhayapala, “Modal analysis and synthesis of broadband nearfield beamform-


ing arrays,” Ph.D. dissertation, The Australian National University, Telecommunica-
tions Engineering Group, http://hdl.handle.net/1885/46049, 2000.

[33] J. McDonough, K. Kumatani, T. Arakawa, K. Yamamoto, and B. Raj, “Speaker track-


ing with spherical microphone arrays,” in Acoustics, Speech and Signal Processing
(ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3981–3985.

[34] D. Colton and R. Kress, Inverse acoustic and electromagnetic scattering theory.
Springer Science & Business Media, 2012, vol. 93.

[35] B. Rafaely, “Plane wave decomposition of the sound field on a sphereby spherical
convolution,” Institute of Sound and Vibration Research, University of Southampton,
Tech. Rep., May 2003. [Online]. Available: http://eprints.soton.ac.uk/46555/1/
Pub9273.pdf?origin=publication detail
REFERENCES 137

[36] M. C. Chan, “Theory and design of higher order sound field recording,”
Department of Engineering, FEIT, ANU, Honours Thesis, 2003. [Online]. Available:
http://users.cecs.anu.edu.au/∼thush/ugstudents/MCTChanThesis.pdf

[37] C. A. Balanis, Antenna theory: analysis and design. John Wiley & Sons, 2012.

[38] E. Fisher and B. Rafaely, “The nearfield spherical microphone array,” in Acoustics,
Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on,
2008, pp. 5272–5275.

[39] The Eigenmike Microphone Array, http://www.mhacoustics.com/.

[40] J. H. Reed, Software radio: a modern approach to radio engineering. Prentice Hall
Professional, 2002.

[41] E. F. Deprettere, SVD and signal processing: algorithms, applications and architectures.
North-Holland Publishing Co., 1989.

[42] A. A. Gareta, “A multi-microphone approach to speech processing in a smart-room


environment,” Ph.D. dissertation, Universitat Politecnica de Catalunya, 2007.

[43] A. Manikas, Differential geometry in array processing. World Scientific, 2004, vol. 57.

[44] J. Benesty, J. Chen, and Y. Huang, Microphone array signal processing. Springer
Science & Business Media, 2008, vol. 1.

[45] E. C. Ifeachor and B. W. Jervis, Digital signal processing: a practical approach. Pearson
Education, 2002.

[46] I. McCowan, “Microphone arrays: A tutorial,” Queensland University, Australia, pp.


1–38, 2001.

[47] C. P. Mathews and M. D. Zoltowski, “Signal subspace techniques for source localization
with circular sensor arrays,” [Technical Reports], 1994, http://docs.lib.purdue.edu/
ecetr/.

[48] H. L. Van Trees, Optimum Array Processing. Wiley-Interscience, 2002.


REFERENCES 138

[49] B. Rafaely, B. Weiss, and E. Bachmat, “Spatial aliasing in spherical microphone arrays,”
Signal Processing, IEEE Transactions on, vol. 55, no. 3, pp. 1003–1010, 2007.

[50] I. Cohen and J. Benesty, Speech processing in modern communication: challenges and
perspectives. Springer, 2010, vol. 3.

[51] P. A. Naylor and N. D. Gaubitch, Speech dereverberation. Springer Science & Business
Media, 2010.

[52] P. Zahorik, in Direct-to-reverberant energy ratio sensitivity, vol. 112. Acoustical Society
Of America, November, 2002, pp. 2110–2117.

[53] C. Knapp and G. Carter, “The generalized correlation method for estimation of time
delay,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 24, no. 4,
pp. 320–327, 1976.

[54] P. R. Roth, “Effective measurements using digital signal analysis,” Spectrum, IEEE,
vol. 8, no. 4, pp. 62–70, 1971.

[55] P. Stoica and R. L. Moses, Spectral analysis of signals. Pearson/Prentice Hall Upper
Saddle River, NJ, 2005.

[56] W. Herbordt and W. Kellermann, “Adaptive beamforming for audio signal acquisition,”
in Adaptive Signal Processing. Springer, 2003, pp. 155–194.

[57] R. Kumaresan and D. W. Tufts, “Estimating the angles of arrival of multiple plane
waves,” Aerospace and Electronic Systems, IEEE Transactions on, no. 1, pp. 134–139,
1983.

[58] J. Chen, K. Yao, and R. Hudson, “Source localization and beamforming,” Signal Pro-
cessing Magazine, IEEE, vol. 19, no. 2, pp. 30–39, 2002.

[59] L. Kumar, R. Mandala, and R. M. Hegde, “Music-group delay based methods for robust
doa estimation using shrinkage estimators,” in Sensor Array and Multichannel Signal
Processing Workshop (SAM), 2012 IEEE 7th. IEEE, 2012, pp. 281–284.
REFERENCES 139

[60] M. J. Daniels and R. E. Kass, “Shrinkage estimators for covariance matrices,,” Biomet-
rics, vol. 57, no. 4, pp. 1173–1184, 2001.

[61] J. P. Hoffbeck and D. A. Landgrebe, “Covariance matrix estimation and classification


with limited training data,,” IEEE Transactions on Pattern Analysis and Machine
Intelligence,, vol. E91–A, no. 8, 2008.

[62] R. Mandala, M. Shukla, and R. Hegde, “Group delay based methods for recognition
of distant talking speech,” in Signals, Systems and Computers (ASILOMAR), 2010
Conference Record of the Forty Fourth Asilomar Conference on, Nov 2010, pp. 1702–
1706.

[63] M. Zatman, “How narrow is narrowband?” IEE Proceedings-Radar, Sonar and Navi-
gation, vol. 145, no. 2, pp. 85–91, 1998.

[64] S. Chandran, Advances in Direction-of-arrival Estimation. Artech House, 2005.

[65] D. Ying and Y. Yan, “Robust and fast localization of single speech source using a planar
array,” Signal Processing Letters, IEEE, vol. 20, no. 9, pp. 909–912, 2013.

[66] F. Wang, X. Cui, M. Lu, and Z. Feng, “Decoupled 2D direction-of-arrival estimation


based on sparse signal reconstruction,” EURASIP Journal on Advances in Signal Pro-
cessing, vol. 2015, no. 1, pp. 1–16, 2015.

[67] A. Griffin, D. Pavlidi, M. Puigt, and A. Mouchtaris, “Real-time multiple speaker doa
estimation in a circular microphone array based on matching pursuit,” in Signal Pro-
cessing Conference (EUSIPCO), 2012 Proceedings of the 20th European. IEEE, 2012,
pp. 2303–2307.

[68] T. Filik and T. E. Tuncer, “Design and evaluation of V-shaped arrays for 2-D DOA
estimation,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE
International Conference on. IEEE, 2008, pp. 2477–2480.

[69] R. Mandala, M. Shukla, and R. Hegde, “Group delay based methods for recognition
of distant talking speech,” in Signals, Systems and Computers (ASILOMAR), 2010
REFERENCES 140

Conference Record of the Forty Fourth Asilomar Conference on, Nov 2010, pp. 1702–
1706.

[70] J.B. Allen and A. Berkley, “Image method for efficiently simulating small-room acous-
tics,” Journal of the Acoustical Society of America, vol. 65, pp. 943–950, 1979.

[71] E. A. Habets, “Room impulse response generator,” [Online], 2003-2010, http://home.


tiscali.nl/ehabets/rir generator.html.

[72] K. Wong and M. Zoltowski, “Root-music-based azimuth-elevation angle-of-arrival es-


timation with uniformly spaced but arbitrarily oriented velocity hydrophones,” Signal
Processing, IEEE Transactions on, vol. 47, no. 12, pp. 3250 –3260, Dec. 1999.

[73] H. Y., “Techniques of eigenvalues estimation and association,” Digital Signal Process-
ing, vol. 7, pp. 253–259(7), October 1997.

[74] V. Cevher and J. H. McClellan, “2-d sensor perturbation analysis: equivalence to awgn
on array outputs,” in SAM 2002, Washington, DC, 4–6 August 2002.

[75] P. Stoica and N. Arye, “MUSIC, maximum likelihood, and Cramer-Rao bound,” Acous-
tics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 5, pp. 720–741,
1989.

[76] P. Stoica and A. Nehorai, “Comparative performance study of element-space and beam-
space music estimators,” Circuits, Systems and Signal Processing, vol. 10, no. 3, pp.
285–292, 1991.

[77] S. Markovich, S. Gannot, and I. Cohen, “Multichannel eigenspace beamforming in a


reverberant noisy environment with multiple interfering speech signals,” Audio, Speech,
and Language Processing, IEEE Transactions on, vol. 17, no. 6, pp. 1071–1086, 2009.

[78] A. Varga and H. J. Steeneken, “Assessment for automatic speech recognition: Ii. noisex-
92: A database and an experiment to study the effect of additive noise on speech
recognition systems,” Speech Communication, vol. 12, no. 3, pp. 247 – 251, 1993.
REFERENCES 141

[79] John S. Garofolo, TIMIT Acoustic-Phonetic Continuous Speech Corpus. Philadelphia:


Linguistic Data Consortium, 1993.

[80] J. Hansen and B. Pellom, “An effective quality evaluation protocol for speech enhance-
ment algorithms,” in Proc. ICSLP, vol. 7, 1998, pp. 2819–2822.

[81] D. Klatt, “Prediction of perceived phonetic distance from critical-band spectra: A first
step,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on
ICASSP’82., vol. 7. IEEE, 1982, pp. 1278–1281.

[82] “Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end
speech quality assessment of narrow-band telephone networks and speech codecs,” ITU-T
Draft Recommendation P.862, 2001.

[83] M. Seltzer, “Bridging the gap: Towards a unified framework for hands-free speech
recognition using microphone arrays,” in Hands-Free Speech Communication and Mi-
crophone Arrays, 2008. HSCMA 2008, May 2008, pp. 104 –107.

[84] W. Zhang and B. Rao, “Robust broadband beamformer with diagonally loaded con-
straint matrix and its application to speech recognition,” in Proc. IEEE Int. Conf.
Acoust., Speech, Signal Processing. , 2006, pp. 785–788.

[85] CSLU, “Multi channel overlapping numbers corpus distribution,” Linguistic Data Con-
sortium, http://www.cslu.ogi.edu/corpora/corpCurrent.html.

[86] R. Goossens and H. Rogier, “Closed-form 2D angle estimation with a spherical array
via spherical phase mode excitation and ESPRIT,” in Acoustics, Speech and Signal
Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE, 2008, pp.
2321–2324.

[87] D. N. Zotkin, R. Duraiswami, and N. A. Gumerov, “Sound field decomposition us-


ing spherical microphone arrays,” in Acoustics, Speech and Signal Processing, 2008.
ICASSP 2008. IEEE International Conference on. IEEE, 2008, pp. 277–280.

[88] B. Rafaely, “Analysis and design of spherical microphone arrays,” Speech and Audio
Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135–143, 2005.
REFERENCES 142

[89] J. R. Driscoll and D. M. Healy, “Computing Fourier transforms and convolutions on


the 2-sphere,” Advances in applied mathematics, vol. 15, no. 2, pp. 202–250, 1994.

[90] B. Rafaely, “Phase-mode versus delay-and-sum spherical microphone array processing,”


Signal Processing Letters, IEEE, vol. 12, no. 10, pp. 713–716, 2005.

[91] G. Arfken and H. J. Weber, Mathematical Methods For Physicists. 5th ed. San Diego
: Academic press, 2001.

[92] Z. Li and R. Duraiswami, “Flexible and optimal design of spherical microphone arrays
for beamforming,” Audio, Speech, and Language Processing, IEEE Transactions on,
vol. 15, no. 2, pp. 702–714, 2007.

[93] B. Rafaely, Y. Peled, M. Agmon, D. Khaykin, and E. Fisher, “Spherical microphone


array beamforming,” in Speech Processing in Modern Communication. Springer, 2010,
pp. 281–305.

[94] P. Stoica, E. G. Larsson, and A. B. Gershman, “The stochastic CRB for array process-
ing: a textbook derivation,” Signal Processing Letters, IEEE, vol. 8, no. 5, pp. 148–150,
2001.

[95] H. Gazzah and S. Marcos, “Cramer-Rao bounds for antenna array design,” Signal
Processing, IEEE Transactions on, vol. 54, no. 1, pp. 336–345, 2006.

[96] A. Weiss and B. Friedlander, “Range and bearing estimation using polynomial rooting,”
Oceanic Engineering, IEEE Journal of, vol. 18, no. 2, pp. 130–137, 1993.

[97] J.-P. Delmas and H. Gazzah, “CRB analysis of near-field source localization using
uniform circular arrays,” in Acoustics, Speech and Signal Processing (ICASSP), 2013
IEEE International Conference on. IEEE, 2013, pp. 3996–4000.

[98] D. T. Vu, A. Renaux, R. Boyer, and S. Marcos, “A Cramér Rao bounds based analysis
of 3D antenna array geometries made from ULA branches,” Multidimensional Systems
and Signal Processing, vol. 24, no. 1, pp. 121–155, 2013.
REFERENCES 143

[99] D. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge uni-


versity press, 2005.

[100] S. M. Kay, “Fundamentals of statistical signal processing, volume i: Estimation theory


(v. 1),” 1993.

[101] H.-C. Song and B.-w. Yoon, “Direction finding of wideband sources in sparse arrays,” in
Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002. IEEE,
2002, pp. 518–522.

[102] D. P. Jarrett, E. A. Habets, M. R. Thomas, and P. A. Naylor, “Simulating room impulse


responses for spherical microphone arrays,” in Acoustics, Speech and Signal Processing
(ICASSP), 2011 IEEE International Conference on. IEEE, 2011, pp. 129–132.

[103] R. Goossens, H. Rogier, and S. Werbrouck, “Uca root-music with sparse uniform cir-
cular arrays,” Signal Processing, IEEE Transactions on, vol. 56, no. 8, pp. 4095–4099,
2008.

[104] F. Belloni, A. Richter, and V. Koivunen, “Extension of root-music to non-ula array


configurations,” in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Pro-
ceedings. 2006 IEEE International Conference on, vol. 4. IEEE, 2006, pp. IV–IV.

[105] M. Costa, A. Richter, and V. Koivunen, “Unified array manifold decomposition based
on spherical harmonics and 2-d fourier basis,” Signal Processing, IEEE Transactions
on, vol. 58, no. 9, pp. 4634–4645, 2010.

[106] E. Gonen and J. M. Mendel, “Subspace-based direction finding methods,” Madisetti,


VK and Williams DB, editor, The Digital Signal Processing Handbook, chapter, vol. 62,
1999.

[107] J. Meyer and G. W. Elko, “Position independent close-talking microphone,” Signal


processing, vol. 86, no. 6, pp. 1254–1259, 2006.

[108] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions: with formulas,


graphs, and mathematical tables. Courier Dover Publications, 2012.
Publications Related to Thesis
Work
In Peer Reviewed International Journal
1. Lalan Kumar and Rajesh Hegde, Stochastic Cramér-Rao bound analysis for doa esti-
mation in spherical harmonics domain, Signal Processing Letters, IEEE, vol. 22, no. 8,
pp. 1030-1034, Aug 2015.
2. Lalan Kumar, Ardhendu Tripathy, and Rajesh Hegde, ”Robust Multi-source Localiza-
tion over Planar Arrays using Music-Group Delay Spectrum,” Signal Processing, IEEE
Transactions on , vol.62, no.17, pp.4627-4636, Sept.1, 2014.
3. Lalan Kumar, and Rajesh Hegde, ”Novel Methods for Localization and Reconstruc-
tion of Near-field Sources in Spherical Harmonics Domain”, Signal Processing, IEEE
Transactions on, Under Review.
In Peer Reviewed International Conferences
1. Lalan Kumar, Kushagra Singhal, and Rajesh Hegde, ”Near-field source localization
using spherical microphone array,” Hands-free Speech Communication and Microphone
Arrays (HSCMA), 2014, 4th Joint Workshop on , pp.82-86, 12-14 May 2014
2. Lalan Kumar, Kushagra Singhal, and Rajesh Hegde, Robust source localization and
tracking using music-group delay spectrum over spherical arrays, in Computational Ad-
vances in Multi-Sensor Adaptive Processing (CAMSAP), 2013 5th IEEE International
Workshop on, Dec 2013, St. Martin, France, pp.304-307
3. Lalan Kumar, Rohan Mandala and Rajesh Hegde, ”MUSIC-Group Delay Based Meth-
ods for Robust DOA Estimation using Shrinkage Estimators”, IEEE Sensor Array and
Multichannel (SAM 2012) Signal Processing Workshop,June 2012, Hoboken, NJ, pp.281
-284.