You are on page 1of 5

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

Suppression of Noise in Speech using Adaptive Gain Equalizer


1

Anil Chokkarapu, 2Sarath C Uppalapati and3Abhiram Chinthakuntla

Abstract The quality of speech during the communication is generally effected by the surrounding noise and interference. To improve the quality of speech signal and to reduce the noise, speech enhancement is one of the most used branches of signal processing. For the reduction of noise from speech signals, one method is the AGE (Adaptive Gain Equalizer). This report presents a real time implementation of an AGE noise suppressor method using Uniform FFT (Fast Fourier Transform) modulated filter bank for speech communication system. Our result shows that this method offers low complexity, low delay and high flexibility makes this method suitable for wide range of implementations. Index TermsAdaptive Gain Equalizer, FFT Filter Bank, Noise Suppression.

1. INTRODUCTION

he objective of AGE is to divide the input signal into a number of frequency sub bands, that are individually and adaptively boosted according to a short term signal-to-noise ratio (SNR) estimate in each sub band at every time instant, that means it is focusing on enhancing the speech signal instead of suppression of the noise. A high sub band SNR estimate indicates that the sub band signal content is less corrupted by noise. Hence the sub band should be boosted. A low sub band SNR estimate indicates that the surrounding noise is dominant in the sub band at hand. Hence no boosting of the sub band speech should be performed. To achieve this speech boosting effect, a short term average per speech tracking and long term average for background noise floor level tracking are calculated simultaneously. Using the coefficients of these quantities, a gain function is achieved that weights the sub band signal directly according to a sub band signal SNR estimate at that particular time instant. If only noise is present in the signal, the noise floor level estimate and the short term average will be approximately same. Hence, the coefficients of these two measures will be unity and no alteration of the sub band signal will be performed. If speech is present, the short term average will increase but the noise floor level estimate will remain approximately unchanged. Hence, the coefficients will become larger than unity, amplifying the signal in the sub-band at hand. A general filter bank is a group of parallel low pass, band pass or high pass filters. It converts the normal representation of the signal nothing but time domain into

time-frequency domain which is usually implemented in modern speech processing methods. Here, we are using a uniform FFT modulated filter bank which comprises of band pass filters which have very little mutual overlap in frequency which is shown in Fig.1. The notation uniform is because of the fact that the filters are uniformly distributed on the frequency axis during the modulation process.

Fig.1. A bank of eight band pass filters hk[n], with Fourier transform Hk[f]=F{hk[n]} , comprise a filter bank.

2.

PROBLEM STATEMENT AND MAIN CONTRIBUTION

In a typical situation where a speech signal is distorted by noise i.e., the noise is acoustically added to the speech. The goal is to suppress the noise using some speech enhancement method resulting in an output signal with a higher SNR. Our main contribution is to design the adaptive gain equalizer noise suppressor for speech enhancement using MATLAB and then implement the method using CC studio on TMSC6713 processor and validate the results.

3.

PROBLEM SOLUTION

Anil Chokkarapu is with School of Engineering, Blekinge Tekniska Hgskola, Karlskrona, Sweden. Sarath C Uppalapati . is with School of Engineering, Blekinge Tekniska Hgskola, Karlskrona, Sweden. Abhiram Chinthakuntla is with School of Engineering, Blekinge Tekniska Hgskola, Karlskrona, Sweden

3.1 Uniform FFT Modulated Filter bank:


This filter bank consists of K band pass filters, for k=0, 1...K-1, with impulse response functions , each of length N taps. These filter banks are created by modulating (frequency-shifting) a low pass prototype

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

filter (which is equivalent to the first band pass filter at DC frequency, i.e. 0 ) , according to

= 0 for n=0, 1, . . .N-1 (1) Therefore, the Z-transform of each modulated band passfilter is given by: = 0 (2) We assume that the input signal filtered by each modulated band pass filter, are subject to decimation a factor D, where D=K/O and O denotes the over-sampling ratio . A filter with the decimator is implemented with the polyphase implementation. It is achieved by dividing the prototype filter into O number of groups containing D number of polyphase components. Now we apply IFFT to the O number of group of polyphase components individually and since the sub band indices are not in order, we need to arrange them in increasing order. Since, we have a symmetric, real valued input speech signal and we have only the first half of the frequency, then the other half can be generated by just taking the complex conjugate values of the first half. This implementation is known as analysis filter bank and it is shown in Fig. 2.

Fig.3. Synthesis filter bank

where k is the sub band index, we get = where * indicates convolution operator.The input signal can be described as =
1 =0

1 =0

(4)

where is the speech part sub-band k and is the noise part sub-band k. The output y[n] is formed by

1 =0

(5)

Where Gk[n] is a gain function (AGE weighting function) which introduces a gain to each sub band and it amplifies the signal when speech is active. Fig. 4 shows the simple block diagram of the AGE.

Fig.2. Analysis filter bank Fig.4. Block diagram of Adaptive gain equalizer

Designing the synthesis filter bank is similar to that of analysis filter bank except for the polyphase components of the synthesis filter bank are obtained by flipping and applying conjugate to the analysis polyphase components and instead of IFFT in the analysis filter bank, we use FFT in the synthesis filter bank. The model of the synthesis filter bank can be shown in Fig. 3.

Two terms used for the calculation of the gain function are; a long term (slow) average, ()and the short term (fast) average , (). The short term average for sub-band k, , () is calculated as, , = 1 , 1 + | | (6) Where is small positive constant, given by =
1 ,

(7)

3.1 Adaptive gain equalizer:


Suppose we have an acoustic noise denoted w[n] and a speech signal denoted s[n]. The noise corrupted speech signal x[n] can be written as = + . By filtering the input signal by using an analysis filter bank, the signal is divided into K sub bands each denoted by

where is the sampling frequency in Hz and , is a time constant in seconds. In the same way, slow average is computed as, , = 1 + , 1 , 1 , = , , 1 > , (8) where is a small positive constant. The AGE gain function is computed as:

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

= min (1,

, ,

(9)

where is some positive constant and decides the gain raise individually applied to each of the sub-band signals.

4. EVALUATION OF RESULTS
A general mathematical of the method was given in section III where a number of sub-band dependent parameters were introduced. In this section, where a practical evaluation is performed, some of these parameters are individually set to the same value for all sub bands. For evaluation purpose, we have recorded a signal with sampling frequency of 8 KHz. The method has been verified in MATLAB before implemented in real-time. We then saved the filter coefficients of the analysis and synthesis filter banks in a format suitable for its use in the DSK C6713 processor. Then we add these filter coefficients while implementing the method on a DSK C6713 processor using CC Studio software. Now input a sine wave of 2Vp-p from a function generator and observe the sine wave output of the DSP.
Fig.6. Pass a generated noise sequence through the DSP system without doing any processing of the signal and we estimate the PSD of the output signal
Welch Power Spectral Density Estimate -30 -40

Power/frequency (dB/rad/sample)

-50 -60 -70 -80 -90 -100 -110 -120

0.1

0.2

0.3 0.4 0.5 0.6 0.7 0.8 Normalized Frequency ( rad/sample)

0.9

Fig.7. PSD of the output signal when noise is passed through the DSP system without doing any processing of the signal

Welch Power Spectral Density Estimate 20

Power/frequency (dB/rad/sample)

Fig.5. Analysis-synthesis filter bank configuration used to verify the filter bank implementation. The sub band signals are scaled by sub band scaling constants to yield sub band output signals =

-20

-40

To verify the implementation of the analysis and synthesis filter bank, we find the power spectral densities of the input noise sequence x[n] and output y[n] and determine the transfer magnitude function. To do this, a noise sequence x[n] is generated by using the randn command in MATLAB and this sequence is stored as a wav-file. We give the generated noise sequence x[n] to the DSP project which simply copies the ADC input to the DAC output i.e., without any filter bank processing as in Fig. 6. We then calculate the PSD of the DAC output and store it as as in Fig. 7. These signals are recorded at a frequency of 44 KHz. Now, we consider the filter bank processing and then calculate the PSD of the DAC output and store it as as in Fig. 8 and Fig. 9. Using the PSD of output and input signal, we can calculate the transfer magnitude function =

-60

-80

-100

0.1

0.2

0.3 0.4 0.5 0.6 0.7 0.8 Normalized Frequency ( rad/sample)

0.9

Fig.8. PSD of the output signal after filter bank processing (without AGE)

We then compared the estimated transfer function to the ideal transfer magnitude function. This will verify the filter bank implementation.

Now, we input a speech signal x[n] which contains some music and noise to the DSP and then observe the output signal after the filter bank processing(without including AGE) and check whether we are able to hear the same speech signal without any disturbances or noise interference. This verifies the implementation of DSP and then we follow the same procedure including AGE and then observe the speech signal. Based on the SNR estimate, the speech signal is amplified and noise is attenuated which results in an enhanced speech signal where and play an important role in determining the output speech signal.

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

Welch Power Spectral Density Estimate 0

-20

-40

-60

-80

-100

-120

0.1

0.2

0.3 0.4 0.5 0.6 0.7 0.8 Normalized Frequency ( rad/sample)

0.9

Fig.9. PSD of the output signal Pyy[f] after using the AGE Noise suppressor

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

As gain function Gk[n] is ratio of short term average and the noise floor level estimate, we must take care to avoid singularities causing numerical overflow in DSP. So, in this implementation, any sample which is outside DSP range was clipped to minimum sample value possible. This method focuses on speech enhancement rather than noise suppression. We are not actually trying to improve the SNR by removing noise, but, we are trying to improve the SNR by amplifying the speech. The time constants that control the short term average should be kept in the range of speech pseudo-stationary time, i.e. about 20-30ms. The noise floor level estimate controlling parameter can be varied around 106 depending on the desired effects. The upper bounding of the gain function affects the resulting speech distortion and should be kept within 5-20 dB. A larger amplification of the signal may result in a piercing sounding output speech. Before implementing the AGE in real-time, it was implemented using MATLAB andthe results obtained from off-line MATLAB simulation and an online realtime DSP implementation of the algorithm are almost similar.

Power/frequency (dB/rad/sample)

5. CONCLUSION
A preprocessing noise suppression algorithm using AGE was developed, implemented and tested with use of filter bank techniques on DSP processor kit using Code Composer studio tool. AGE algorithm for speech enhancement is a straight forward, robust and flexible method for speech enhancement.

0.5

1.5

2.5

3.5

4.5 x 10

5
5

Fig.10. Input noisy speech signal


0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

6. REFERENCES
[1] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Prentice Hall, 1993. N. Westerlund, M. Dahl, and I. Claesson, Speech enhancement for personal communication using an adaptive gain equalizer, Elsevier Signal Processing, vol. 85, pp. 10891101, 2005 S. Benny, N. Grbic and I. Claesson, Implementation aspects of the adaptive gain equalizer, Array Signal Processing, Research Report 2006:4 S. Benny, Digital Signal Processors ET1304 Projects. BTH, EE, Karlskrona, Sweden.

[2]

[3]
0 1 2 3 4 5 x 10 6
5

[4]

Fig.11. Enhanced speech signal using AGE noise suppression method obtained from DSP

A small results in unnatural sounding speech with remaining artifacts. A very large results in a short term average that reacts too slowly to the incoming signal amplitude variations. Hence, the speech attacks will be cropped and also, speech amplifications with a small amount of noise will be limited. The positive constant controls how fast the noise floor level estimate will adapt to changes in the noise environment. A small value of k results in an noise floor level estimate equal to the short term average while very large value results in slow convergence and poor noise level tracking capabilities in non-stationary environments.

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 3, ISSUE 4, APRIL 2013

Anil Chokkarapuwas born in Huzarabad, India in 1987. He completed his Dual Masters (M.Sc & M.Tech) programme in Electrical Engineering with emphasis on Signal Processing at Blekinge Institute of Technology, Karlskrona, Sweden during 20102012. He completed his Bachelor degree in Electronics and Communication Engineering from Jawaharlal Nehru Technical University (JNTU) Hyderabad, India in the year 2009. His areas of research interests are in Audio/Speech Processing, hearing aids Biomedical Signal Processing.

Abhiram Chinthakuntlawas born in Warangal, India in 1988. He completed his Bachelor degree in Electronics and Communication Engineering from Jawaharlal Technological University (JNTU) Hyderabad. He completed Masters programme in Electrical Engineering, with emphasis on Signal Processing at Blekinge Institute of Technology, Karlskrona, Sweden. His areas of interests are Digital signal processing, image/audio processing, Adaptive systems and Neural Networks.

Sarath Chandra Uppalapati was born in 1987 at Hyderabad, India. He did his Master of Science (M.Sc) in Electrical Engineering with emphasis on Signal Processing at Blekinge Institute of Technology, Sweden and Master of Technology (M.Tech) in Signal Processing at JNTU University, India. He completed his Bachelor degree in Electronics and Communication Engineering, from Jawaharlal Nehru Technical University (JNTU), India. He worked as a Trainee Engineer at Metox Resistor Industries from July 2008 till august 2009. His current research interests are Speech Processing, Image Processing and Neural Networks.

You might also like