You are on page 1of 6

Enhancement of Speech Signals Corrupted by

Impulsive Noise Using Wavelets and Adaptive


Median Filtering
Prateek Basavapur Swamy, Rohini S.Hallikar, M Uttara Kumari
Department of Electronics and Communication Engineering,
R.V. College of Engineering,
Bangalore
Email: bs.prateek@gmail.com, rohinish@rvce.edu.in, uttarakumari@rvce.edu.in
AbstractThis paper presents a noise cancellation technique
to remove impulsive noises that commonly corrupt speech signals.
A discrete wavelet transform is applied on the corrupted speech
signal to obtain the approximation and detail coefficients. Reconstruction is done using only the detail coefficients. A threshold
depending on the signal statistics is applied on the reconstructed
signal to detect information of the time occurrence of impulsive
noise. Based on the number of samples at a stretch that are
corrupted, an adaptive filter with a variable size window is applied on the corrupted speech signal to remove the impulse noise.
Evaluations of the proposed system show that the intelligibility
of speech is improved.

I.

I NTRODUCTION

Noise that affects signals may be classified into the following types based on its time and frequency characteristics:
narrow band noise, band limited white noise, colored noise, impulse noise and transient noise pulses [1]. More than one type
of noise may be simultaneously present at any given instant of
time. Noise is commonly modeled as Additive Gaussian White
Noise and the filters that remove it use these characteristics.
However, when speech is corrupted by impulsive noise, these
filters do not perform optimally. Impulsive noise corresponds
to noise that is very short in duration i.e. tending to last up to a
few milliseconds. It is typically loud during its short duration
of occurrence. As a result, about a hundred samples at a stretch
or more can be corrupted by noise depending on the sampling
frequency. Typical examples of impulsive noise which decrease
speech intelligibility and are unpleasant include background
keyboard strokes during video conferencing, indicator clicks in
cars, rain drops hitting a hard surface, certain factory sounds,
machine gun firing etc[2], [3] . Various techniques have been
designed to tackle this problem. Some of them include: the
classical Signal Dependent Rank Order Mean Algorithm [4],
methods based on: wavelets [2] [5],soft decision and recursion
[6], diffusion filtering [7] and Bayesian frameworks [8]. Many
of these techniques however are not robust to speech signals
that are affected by impulse noise of varying time durations.
The technique used in [5] assumes that we have information
about the characteristics of the energy distribution of the
impulse noise signal. The technique used in [9] is optimized
to remove impulsive disturbances from old gramophone music
recordings as it uses the characteristics of the music signal to
perform processing. The Signal Dependent Rank Order Mean
c 2013 IEEE
978-1-4673-6190-3/13/$31.00

Algorithm [4] is very easily realizable in hardware. However,


it assumes that at a stretch, multiple samples of a speech
signal are not corrupted. The proposed robust technique that
we have developed is designed to remove impulsive noise of
varying time duration (i.e. anything between a few samples to
a hundred or more samples being corrupted at a stretch). The
block diagram of the entire system is as shown in figure 1.
The technique works in the following manner:

Apply discrete wavelet transform to the speech signal


to obtain approximation and detail coefficients

By setting the approximation coefficients to 0 and


using the detail coefficients at level 1, apply an inverse
discrete wavelet transform

Apply a threshold based on a function of the l1


norm of the obtained signal from the previous step
and classify anything above that level as noise and
anything below that level as clean speech.

The signal obtained from the previous step gives


information of the time occurrence of the impulse
noise. This signal is then interpolated by using a
window of a certain width to account for undetected
samples of impulse noise. This results in the detector
output having localized pulses that represent noise.

Using the detector output from the previous step , a


variable window width median filter is applied to only
the samples of the speech signal that are corrupted by
impulsive noise.

This paper is organized as follows. The detector algorithm and


the adaptive median filter algorithm to remove the detected
noise are presented in Sections II and III respectively. The
experimental setup and results are presented in Section IV.
The paper ends with the Conclusion and Future work which
is outlined in Section V.
II.

D ETECTION OF I MPULSIVE N OISE IN T IME D OMAIN

The detector algorithm works in 3 stages. Stage 1 uses


wavelets to obtain a low frequency part represented by the
approximation coefficients and a high frequency part represented by the detail coefficients. The detail coefficients are

Fig. 1: Block diagram of entire system

then thresholded in Stage 2 to obtain the time occurrences of


impulsive noise. Stage 3 interpolates the signal obtained from
stage 2 using a window to obtain localized pulses. These pulses
indicate that impulsive noise acts on the speech signal during
those samples.
A. Discrete Wavelet Transform
Wavelets are localized waves that enable us to obtain
simultaneous time and frequency information of the signal.
Dilations and translations of a mother function give rise to an
orthogonal wavelet basis. Different families of wavelets have
different mother functions.
(s,l) = 2s/2 (2s x l)

(1)

s and l are integers that scale and dilate the mother function
to generate different wavelet families like Daubechius, Symlet,
Coiflet, Dmey and so on.
w(x) =

k=N
X2

(1)k ck+1 (2x + k)

(2)

k=1

where ck s are the wavelet coefficients and w is the scaling function for the corresponding mother function . The
coefficients {c0 , c1 , c2 ......cn } are placed in a transformation
matrix. These coefficients are ordered using two dominant
patterns: smoothing coefficients (approximation) and detail
coefficients[10].
In the proposed method we use only level 1 approximation
and detail coefficients. Level 1 coefficients are obtained when
the discrete wavelet transform is taken just once of the original
signal. Most of the signal energy is concentrated in the
approximation coefficients [11]. The detail coefficients contain
the high frequency part of the speech.
{cA1 , cD1 } = DW T (x)

(3)

where cA1 and cD1 represent the approximation and detail


coefficients of level 1. x represents the input speech signal
corrupted with impulsive noise and DW T represents the
wavelet transform operator matrix which is different for different wavelet families.

In the next step, we then obtain a signal in the time domain


by reconstructing from only the detail coefficients
y = IDW T (v, CD1)

(4)

where v represents a zero vector of length equal to the length


of cA1 and IDW T is a inverse wavelet transform operator
matrix.
B. Thresholding
We perform thresholding on y, the detail coefficient reconstruction obtained from equation (4) in this step. y represents
the time domain reconstruction of the high frequency part of
the corrupted speech signal. The lp norm of the signal is given
by
n
X
1
(5)
kykp = (
|xi |p ) p , p [1, )
i=1

The signal obtained from the inverse discrete wavelet transform


y is fed into an algorithm that computes a threshold based on
the l1 norm of the signal. By using p = 1 in the above equation
we get the l1 norm of the signal.
kyk1
(6)
N
Here N denotes the length of y and T denotes the threshold.
The signal is then run through a thresholding filter that
identifies all signal values that exceed the threshold. A value
of 1 is given to identify them for further analysis.

1, if |y(i)| T .
Z(i) =
(7)
0, if |y(i)| < T .
T =

The obtained signal gives the time occurrence of the


impulsive noises. Z(i) = 1 indicates that there exists some
impulsive noise at time instant i in x(i), the corrupted speech
signal.
This is in contrast to the denoising method used in [12] in
which a threshold is employed in the wavelet domain to modify
coefficients in the wavelet domain itself before reconstruction.
The uncorrupted speech signal is shown in the first plot of
figure 2. The second plot shows the speech signal corrupted by
impulsive noise. The third plot indicates the impulsive noise

separately. It can be seen that noise of different time durations


act on the speech signal to corrupt it. This is the kind of
impulsive noise that is present in the real world. The fourth
plot in the same figure shows the reconstructed sequence y
and also indicates the threshold computed from equation (6).

0.3
0.2
0.1
0

C. Interpolation
For longer duration impulsive sounds, some of the time
instants of occurrence of the noise may go undetected. We use
the following method to mitigate this problem
k : (j < k < m)and(m j < w)

1,
if (Z(j) = 1 and Z(m) = 1).
Z (k) =
Z(k), elsewhere.

(8)

Where Z (k) is the interpolator output. w is the expected width


of the maximum duration occurring impulsive noise. This can
be computed from the sampling rate as

0.1
0.2

150

200

250

300

350

400

450

500

0.2
0.1
0

Where fs and tw are the sampling rate and maximum expected


duration of impulse noise. The output of the interpolator Z (k)
is shown in Figure 3. It consists of localized pulses that
indicate that noise was present in the speech signal during
those time instants. The interpolator information is then used
by the adaptive median filter as indicated in the block diagram
shown in Figure 1.

0.1

III.

100

0.3

(9)

w = f s tw

50

0.2

100

200

300

400

500

A DAPTIVE W INDOW M EDIAN F ILTER

Only certain samples of the noisy signal are filtered to


ensure that the clean parts of the speech are not modified.
The noisy parts are given by the output of the interpolator
from the previous block of the system. An adaptive window
width median filter is used. The median filter removes noise
with duration less than half the size of the filter window. If
a block of samples are found to be corrupted by the detector,
that whole block of samples is median filtered with a window
of size more than twice the block size to remove the noise.
The steps involved in this process are

0.06
0.04

Threshold

0.02
0

A. Formation of a temporary local sequence

0.02

A temporary sequence is formed by taking the samples


corrupted by noise and just enough preceding and succeeding
samples so that median filtering with a window size of more
than twice the size of the local noise pulse is possible.
k :

k=j+m
Y

Z (k) = 1,

k=j

(10)

l = j m, h = j + 2m;
t(1 : h l + 1) = |x(l : h)|
B. Median Filtering of the temporary sequence
The temporary sequence obtained from the previous step
is passed through a median filter.
w = 2m + 1
(11)
temp = medianw (t)
Here w represents the width of the window used. As w is just
more than twice the width of the local noise pulse output of
the detector Z , it is able to remove the impulsive noise.

0.04
0.06

50

100

150

200

250

300

350

400

450

500

Fig. 2: The first plot shows 500 samples of a long speech


signal that is devoid of impulsive noise. The second plot shows
speech signal corrupted by impulse noise Impulsive Noise
is shown separately in the third plot. Impulse noise occurs
at 2 time instants. The first time it occurs, it corrupts very
few samples. The second time it occurs, it corrupts about 50
samples at a stretch (i.e. a few milliseconds) The fourth plot
shows y, the signal obtained from inverse discrete wavelet
transform which is then thresholded in the next step.

C. Replacement of the corrupted samples


Only the samples of the noisy speech signal that are corrupted by the impulse noise are replaced by the corresponding

values from the temporary sequence.



sgn(x(k))temp(l + 1),
x(k),

l : j < l + 1 < j + m.
elsewhere .
(12)
The third plot in Figure 4 shows the result of the adaptive
median filtering process s. s is the estimate of the uncorrupted
speech signal s.
s(k) =

T rans(i, j) gives the probability of transition from


State i to State j. For example, T rans(1, 2) = gives
the probability of transition from State 1 to State 2.

0.3
0.2
0.1

IV.

I MPLEMENTATION R ESULTS
0

A. Speech Corpus and noise modeling

0.1

Our training data consisted of speech recordings sampled


at 11.025 KHz from an audio recorder. Impulsive noise was
modeled artificially to have Gaussian amplitude distribution
but occurring only during certain short time duration [13]. The
third plot of Figure 2 illustrates the idea. Sample numbers
220 to about 260 are corrupted by impulsive noise that has
Gaussian amplitude distribution in Figure 2. The noise is
appropriately modeled in 2 stages:
1)

Time Invariant Markov process In the first stage of


the noise modelling process, a 2 state Markov process
is considered. A Markov process is one in which
the probability of the process being in a particular
state depends only on its immediate past state. One
of the states is modeled to be impulsive and the
other state is modeled to be a non impulsive state.
If a block of N samples are being processed at a
stretch, the length of the random sequence of states
generated by the Markov model is set to be equal
to N. In Figure 5, State 2 is considered to be a
impulsive state i.e. in this state, the noise is said to
be present. State 1 is a non impulsive state and the
noise process is considered to be switched off in this
state [13], [14]. The Markov model considered has
2 matrices associated with it: a transition probability
matrix which specifies the probability of transitions
from one state to any other state (including itself) and
an emission matrix which gives information about
which symbol is emitted from which state. In general,
the transition probability matrix that we use to model
the impulsive noise is as follows


1
T rans =
(13)
1

0.2

50

100

150

200

250

300

350

400

0.2
0.1
0
0.1
0.2

100

100

200

300

400

0.2
0.1
0
0.1
0.2

200

300

400

0.5
0
0.5

50

100

150

200

250

300

350

400

450

500

Fig. 3: The interpolator output Z goes high whenever y exceeds


the threshold. The noise is also plotted on the same graph to
illustrate the proper functioning of the detector.

500

Fig. 4: The first plot shows 500 samples of a long uncorrupted


speech signal. The second plot shows the Speech signal corrupted by impulsive noise. The third plot shows the Adaptive
window median filtered output.

Detector Output
Noise

500

0.3

500

0.3

2
1.5

450

Fig. 5: A 2 state Markov chain

The transition probability can be adjusted depending


upon the expected widths of the kind of impulsive
noise that specifically occurs and the probability of
occurrence of the noise. By increasing the value
of T rans(1, 2), impulsive noise that occurs more
frequently can be modeled. By increasing the value
of T rans(2, 1) the time duration of each occurrence
the impulse noise can be reduced. We specify the
emission matrix as follows


0 1
Emis =
(14)
1 0

2)

Emis(i, j) gives the probability that symbol j is


emitted from state i. Let us define the symbol 1 to
be the number 1 and the symbol 0 to be the number
0. Thus we land up with a stream of length equal to
the block size being processed but consisting of 0s
and 1s. Let us name this stream as ak
Multiplication with a Gaussian Distribution A Gaussian distribution with mean equal to zero and variance
is multiplied to the stream ak generated in the
previous step. The result is a Gaussian distributed
noise but occurring randomly over very short duration
of time. This completes the modeling of the impulsive
noise. The waveform generated may look like the one
shown in Figure 6

0.3
0.2
0.1
0
0.1

Wavelet

Efficiency ()

Haar

46.41%

Daubechies (db6)

93.90%

Symlet (sym2)

84.20%

Coiflet

83.01%

Dmey

94.77%

TABLE I: Detection efficiency using different wavelets

Similar detector efficiencies were obtained under the same


Signal to Noise Ratio values for sounds taken from the TIMIT
database as well. We see that the db6 and the dmey wavelet
outperform the other wavelets. We have used the dmey wavelet
for our entire analysis in this paper.
C. Objective Evaluation Criteria
We consider 2 pure speech signals corrupted by vastly
different kinds of impulsive noise (of varying time durations)
by adjusting the transition probabilities in the Markov Model
of noise. The SNR of the corrupted speech signals (x1 and x2 )
and the correlation coefficient of the corrupted speech signals
with the pure speech signals (s1 and s2 ) are mentioned in
Table 2. In Table 2, s,s+n indicates the correlation coefficient between the clean speech signal and the impulse noise
corrupted speech signal.
Let s,s be the Correlation Coefficient [15] between the
uncorrupted speech signal and the adaptive filtered signal. The
percentage increase in Correlation coefficient values after the
speech enhancement are computed as
P ercentageIncrease =

0.2
0

0.5

1.5

2.5

3.5

4
4

x 10

Fig. 6: Impulsive noise of length 40000 samples

B. Detector efficiency
The performance efficiency of the detector presented in
section II is investigated over here. We define efficiency metric
as follows:
100(N errors)
=
(15)
N
Here,N represents the number of samples being processed and
errors gives the total number of errors made in the detection
process. An error may occur in 2 ways
1)
2)

A sample classified as having noise but which contained no noise.


A sample classified as pure/uncorrupted but which
had noise present.

The efficiency metric when different wavelet families are


used in equation (3) are as presented in Table 1.

(s,s s,s+n )100


s,s+n

(16)

The formula used for computing the output Signal to Noise


Ratio (SN Rout ) is as shown below [5]


|x(n)|2
(17)
SN Rout = 10 log
|
x(n) x(n)|2
The objective metrics that indicate that the speech is enhanced
is given by the values in Table 3. In Table 3, the proposed
method is compared against the Signal Dependent Rank Order
Mean Algorithm proposed in [4].
Signal Statistics

Corrupted Speech Signal


(x1 )

Corrupted Speech
Signal (x2 )

Signal to Noise Ratio


(SNR)

3.80 dB

-3.56 dB

s,s+n

0.841

0.5591

TABLE II: Signal statistics of a pure speech signal corrupted


by impulsive noise

The removal of a 3 second speech signal using the proposed


technique is illustrated in figure (7). Impulsive noise of various
time durations are efficiently removed as seen in the third plot
of figure (7).

0.3

indicate improvements in speech quality metrics. Future work


might involve a real time implementation of the system so that
it could be used in devices like cochlear implants.

0.2
0.1

R EFERENCES
0

[1]

0.1

[2]

0.2

[3]
0

0.5

1.5

2.5

3.5

4
4

x 10

[4]

0.3
0.2

[5]

0.1
0

[6]

0.1
0.2

[7]
0

0.5

1.5

2.5

3.5

4
4

x 10
0.3

[8]
0.2
0.1

[9]
0
0.1

[10]

0.2

0.5

1.5

2.5

3.5

4
4

[11]

x 10

Fig. 7: The first plot shows an uncorrupted 3 second Speech


Signal at a sampling frequency of 11.025 KHz. The second
plot shows the speech signal corrupted by impulsive noise.
The third plot shows the output of the proposed technique. It
can be seen that the impulsive noise is removed effectively

[12]
[13]
[14]
[15]

Improvement in

Improvement in SNR
Proposed Technique

SDROM

Proposed Technique

SDROM

x1

8.93 dB

4.63 dB

15.80%

10.70%

x2

18.7 dB

11.98 dB

76.10%

66.50%

TABLE III: The improvement in SNR and the percentage increase in correlation coefficient are indicated for the proposed
technique and the SDROM algorithm used in [4].

V.

C ONCLUSION

This paper presents a novel robust technique to remove


impulsive noise that affects speech signals. The word robust is
used in the context that it efficiently removes impulsive noise
of varying time durations. Our approach is based on using
wavelets and adaptive median filters. Experimental results

S. V. Vaseghi, Advanced digital signal processing and noise reduction.


Wiley, 2008, pp. 2943.
R. C. Nongpiur, Impulse noise removal in speech using wavelets, in
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE
International Conference on. IEEE, 2008, pp. 15931596.
A. Subramanya, M. L. Seltzer, and A. Acero, Automatic removal of
typed keystrokes from speech signals, Signal Processing Letters, IEEE,
vol. 14, no. 5, pp. 363366, 2007.
C. Chandra, M. S. Moore, and S. K. Mitra, An efficient method
for the removal of impulse noise from speech and audio signals, in
Circuits and Systems, 1998. ISCAS98. Proceedings of the 1998 IEEE
International Symposium on, vol. 4. IEEE, 1998, pp. 206208.
Z. He, X. Guo, and M. Zhang, Detection and removal of impulsive
colored noise for speech enhancement, in Information and Automation
(ICIA), 2010 IEEE International Conference on. IEEE, 2010, pp.
23202324.
S. Zahedpour, S. Feizi, A. Amini, M. Ferdosizadeh, and F. Marvasti,
Impulsive noise cancellation based on soft decision and recursion,
Instrumentation and Measurement, IEEE Transactions on, vol. 58, no. 8,
pp. 27802790, 2009.
R. Talmon, I. Cohen, and S. Gannot, Speech enhancement in transient
noise environment using diffusion filtering, in Acoustics Speech and
Signal Processing (ICASSP), 2010 IEEE International Conference on.
IEEE, 2010, pp. 47824785.
J. Murphy and S. Godsill, Joint bayesian removal of impulse and background noise, in Acoustics, Speech and Signal Processing (ICASSP),
2011 IEEE International Conference on. IEEE, 2011, pp. 261264.
M. Niedzwiecki and M. Ciolek, Elimination of impulsive disturbances
from archive audio signals using bidirectional processing, Audio,
Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 5,
pp. 10461059, 2013.
A. Graps, An introduction to wavelets, Computational Science &
Engineering, IEEE, vol. 2, no. 2, pp. 5061, 1995.
J. I. Agbinya, Discrete wavelet transform techniques in speech processing, in TENCON96. Proceedings. 1996 IEEE TENCON. Digital
Signal Processing Applications, vol. 2. IEEE, 1996, pp. 514519.
D. L. Donoho, De-noising by soft-thresholding, Information Theory,
IEEE Transactions on, vol. 41, no. 3, pp. 613627, 1995.
S. V. Vaseghi, Advanced digital signal processing and noise reduction.
Wiley, 2008, pp. 362364.
T. M. Cover and J. A. Thomas, Elements of information theory. John
Wiley & Sons, 2012, pp. 7174.
D. P. Bertsekas and J. N. Tsitsiklis, Introduction to probability, vol. 1.
Athena Scientific Belmont, MA, 2002, pp. 181183.

You might also like