You are on page 1of 3

Project Report

Implementation and Comparison of Noise Removal and Echo Cancellation for Audio Signals
Adersh Miglani
eez118471@ee.iitd.ac.in Course: SIV864 Indian Institute Of Technology, Delhi

I. P URPOSE The digital signal processing, source of noise, measurement of information loss, enhancement and suppression of signals are important in studying information ltering of a signal. Speech signals are evaluated and processed in transformed domain using digital signal processing to reduce noise and to remove undesired speech signals. The transmission medium, compression techniques and noisy environments are the main sources of degradation of speech. The type of noise signal depends on the source of noise. Purpose of this project is to study some noise removal and echo cancellation techniques and analysis of some basic implementation results. This project report is organized as follows. In section 2, objective methods to evaluate improvement in the quality of speech signal are discussed. In section 3, measurement and analysis of noise power spectrum are discussed and then techniques to remove dominant noise components and cancellation of echo in speech signals are described. That includes the review of current literature on speech enhancements. In section 4, results of experiment of some known speech enhancements techniques are analyzed. II. O BJECTIVE M EASUREMENT OF N OISE The quality and intelligibility of speech signal should be measured to quantify the reversal of degradation [1]. There are two categories to measure the amount of noise present before and after speech processing. First, subjective measurement techniques require intervention of human listeners. These techniques are standardized for phonetics tests [2], word intelligibility and sentence intelligibility methods. Second, objective measurement techniques require comparison of original and processed signals and those results are considered as authentic in comparison with subjective tests. These are further divided into two groups - intrusive and non-intrusive methods. Intrusive methods are used when original speech signal is clean and processed signal has gone through communication channel, compression and decompression cycles and/or other speech processing techniques. Both signals are divided into short window from 10 to 30 milliseconds. The signal to noise ratio is measured as a global and local scores for window and complete signal.

Those are called segmented SNR techniques. There are some experiments performed to compare these score with subjective tests [3] [4]. The difference between the noisy and processed signal is multiplied by a constant term that is decided based on the clean signal. Non-intrusive methods are used when original clean signal is not available [5]. Amount of enhancement is computed from noisy and processed signal alone. In case of live telecast or playing stored audio signals, these methods are primarily used. The intelligibility of audio is due to distortion in the speech signals, background noise or both. Yi Hu evaluated various objective quality measurement criteria [6]. Some of those are segmented SNR, weighted spectral slope (WSS), PESQ, log-likelihood ratio (LLR), Itakura-Satio distance (IS), and cepstrum distance (CEP). Yi Hu has done extensive study of these measures and provided information on estimated correlated coefcients and standard deviation of objective measures with overall quality, distortion in signal and distortion due to background noise. It was concluded that segmentation SNR formula was giving poor results with over all quality and, therefore, should not be used for performance measure of enhancement algorithms. Through study illustrates that most of the enhancement measurement criteria shows better results in case of signal distortion but not for background noise. Therefore, selection of measures should also consider type of noise to be treated. Jianfen Ma [7] proposed three measures to account distortions introduced in the processed speech due to enhancement algorithms. Those three measures SN RLOSS , ESC, SN RLESC are derived from SN R and used to test on consonants and sentence signals. III. S PEECH E NHANCEMENT T ECHNIQUES In the previous section, degradation of speech signal and addition of echo are considered as two broad groups for loss of intelligibility of speech. Here, those are described in terms of signal processing methods to remove those degradation [1]. The speech signal is divided into the small overlapping window of small sizes. Generally, 50% overlapping is used. Length of signal in such window is in the range of 10 to 30

milliseconds. Short Time Fourier Transform is applied to each window and subsequent processing is performed. S(ejw ) = G(ejw )X(ejw ) Spectral subtraction method is most commonly used to remove the background noise [8]. The hamming coefcients are used to subtract a part of magnitude of noisy signal. The phase is unaltered. This method leaves the broadband noise and narrow band spectral spikes. These are responsible for tonal noise. Some improvements are suggested with modication of gain function G(ekw ) [9]. Here, SNR based nonintrusive speech evaluation measures are used to quantify the enhancement. Recent advancements in enhancement algorithms are proposed to process signal in time and frequency domain to remove the background noise [10]. This method addresses high SNR regions in time domain while removing degradation in spectral domain. The temporal and spectral processing based methods are proposed for echo cancellation [11]. This method uses signal to reverberation ratio (SRR) regions in the temporal domain. The spectral processing and temporal processing are performed in sequence. The segmental SRR and log spectral distance are computed as objective measures. Spectral subtraction based methods are combined with RASTA processing to remove tonal noise along with boradband and additive stationary noise. Non-stationary noise environment introduces additional complexity that is resolved. The optimally-modied logspectral amplitude (OM-LSA) speech estimator and minima controlled recursive averaging (MCRA) noise estimators are used before applying spectral gain function [12]. IV. E XPERIMENTS AND R ESULTS The background noise and distortion due to reverberation are commonly available degradation in speech. Two experiments are performed to remove those two degradation from mono and stereo speech signals. For removing echo and background noise, intrusive objective technique is used. Echo effects are added into the clean speech and FFT magnitude truncation method is used to remove the echo effects. The plot clean and enhanced signal after removing noise for 1st and 2nd channel is shown in Fig-1 and Fig-2. Spectral subtraction algorithm is used to remove background distortion from a given noisy speech signal [8]. The hamming window size is 256. Standard MATLAB function is used to generate hamming coefcients. Those hamming coefcients are used to remove back ground noise in transformed domain. SNR loss is the intrusive method to measure the enhancement in processed signal. Variance of noisy and enhanced signal is computed for SNR loss. SN Rnoise = 10 log10 ( variance(clean) ) variance(noisy)

Fig. 1.

Echo cancellation from 1st channel

Fig. 2.

Echo cancellation from 2nd channel

SN Renhanced = 10 log10 (

variance(clean) ) variance(enhanced clean)

The plot of clean, noisy and enhanced signal is shown in Fig-3. R EFERENCES
[1] C. Labs, Speech Enhancement Tutorial. [Online]. Available: http: //www.clear-labs.com/tutorial [2] R. L. Miller, Nature of the vocal cord wave, J Acoust Soc Am, vol. 31, no. 6, pp. 667677, Jun. 1959. [3] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, Perceptual evaluation of speech quality (pesq)-a new method for speech quality assessment of telephone networks and codecs, in Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP 01). 2001 IEEE International Conference on, vol. 2, 2001, pp. 749 752 vol.2. [4] T. Yamada, M. Kumakura, and N. Kitawaki, Subjective and objective quality assessment of noise reduced speech signals, in Nonlinear Signal and Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip, may 2005, p. 28. [5] A. Rix, Perceptual speech quality assessment - a review, in Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP 04). IEEE International Conference on, vol. 3, may 2004, pp. iii 10569 vol.3.

Fig. 3.

Spectral noise removal

[6] Y. Hu and P. C. Loizou, Evaluation of Objective Quality Measures for Speech Enhancement, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 16, no. 1, pp. 229238, 2008. [Online]. Available: http://dx.doi.org/10.1109/TASL.2007.911054 [7] J. Ma and P. C. Loizou, Snr loss: A new objective measure for predicting the intelligibility of noise-suppressed speech, Speech Communication, vol. 53, no. 3, pp. 340354, 2011. [8] S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 2, pp. 113120, Apr. 1979. [9] M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP 79., vol. 4, apr 1979, pp. 208 211. [10] P. Krishnamoorthy and S. R. M. Prasanna, Enhancement of noisy speech by temporal and spectral processing, Speech Commun., vol. 53, no. 2, pp. 154174, Feb. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.specom.2010.08.011 [11] , Reverberant speech enhancement by temporal and spectral processing, Trans. Audio, Speech and Lang. Proc., vol. 17, no. 2, pp. 253266, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1109/ TASL.2008.2008039 [12] I. Cohen and B. Berdugo, Speech enhancement for non-stationary noise environments, Signal Processing, vol. 81, no. 11, pp. 24032418, 2001.

You might also like