IJIGSP Vol. 11, No. 9, 8 Aug. 2019
Cover page and Table of Contents: PDF (size: 1375KB)
Full Text (PDF, 1375KB), PP.44-55
Views: 0 Downloads: 0
Speech Enhancement, Wavelet thresholding, Multitaper Power Spectrum, Noise power estimation, smoothing parameter, SNR, threshold
This paper presents a method to reduce the musical noise encountered with the most of the frequency domain speech enhancement algorithms. Musical Noise is a phenomenon which occurs due to random spectral speaks in each speech frame, because of large variance and inaccurate estimate of spectra of noisy speech and noise signals. In order to get low variance spectral estimate, this paper uses a method based on wavelet thresholding the multitaper spectrum combined with noise estimation algorithm, which estimates noise spectrum based on the spectral average of past and present according to a predetermined weighting factor to reduce the musical noise. To evaluate the performance of this method, sine multitapers were used and the spectral coefficients are threshold using Wavelet thresholding to get low variance spectrum .In this paper, both scale dependent, independent thresholdings with soft and hard thresholding using Daubauchies wavelet were used to evaluate the proposed method in terms of objective quality measures under eight different types of real-world noises at three distortions of input SNR. To predict the speech quality in presence of noise, objective quality measures like Segmental SNR ,Weighted Spectral Slope Distance ,Log Likelihood Ratio, Perceptual Evaluation of Speech Quality (PESQ) and composite measures are compared against wavelet de-noising techniques, Spectral Subtraction and Multiband Spectral Subtraction provides consistent performance to all eight different noises in most of the cases considered.
P.Sunitha, K.Satya Prasad, "Speech Enhancement based on Wavelet Thresholding the Multitaper Spectrum Combined with Noise Estimation Algorithm", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.11, No.9, pp. 44-55, 2019. DOI: 10.5815/ijigsp.2019.09.05
[1]R.Martin, “An efficient algorithm to estimate the instantaneous SNR of speech signals”, proceedings of Euro speech ,Berlin,pp.1093-1096,1993.
[2]R.Martin, “Spectral subtraction based on minimum statistics, Proceedings of European Signal Processing,U.K,pp.1182-1185,1994.
[3]G.Doblinger, “Computationally efficient speech enhancement by spectral minima tracking in sub bands”, proceedings of Euro speech ,Spain, pp:1513-1516,1995.
[4]H.Hirch, and C.Ehrlicher, “Noise estimation techniques for robust speech recognition”, proceedings of IEEE International Conference on Acoustic Speech Signal Processing, MI, pp.153-156,1995.
[5]R.Martin, “Noise Power Spectral Density Estimation based on Optimal Smoothing and Minimum statistics”, IEEE Transactions on Audio, Speech Processing pp.504–512, 2001.
[6]I.Cohen, “Noise Estimation by Minima controlled recursive averaging for robust speech enhancement”, IEEE Signal Processing. Letter, pp.12–15,2002
[7]I.Cohen, “Noise spectrum Estimation in adverse environments: Improved Minima controlled recursive averaging”, IEEE Transactions on Audio, Speech Processing, pp.466-475,2003.
[8]L.Lin ,W.Holmes and E.Ambikairajah , “Adaptive noise estimation algorithm for speech enhancement”,Electron .Lett,754-555,2003
[9]Loizou, R.Sundarajan,Y. Hu,”Noise estimation Algorithm with rapid Adaption for highly non-stationary environments “Proceedings on IEEE International Conference on Acoustic Speech Signal Processing,2004.
[10]Loizou, R.Sundarajan, “A Noise estimation Algorithm for highly non-stationary Environments”. Speech Communication,48,Science Direct , pp.220-231,2006.
[11]Yi.Hu ,P.C .Loizou.,"Speech enhancement based on wavelet thresholding the multitaper spectrum”, IEEE Transactions on Speech and Audio Processing,pp.59-67,2004.
[12]C.Ris and S.Dupont, “Assessing local noise level estimation methods: Applications to noise robust ASR”, Speech Communication, pp.141-158,2001.
[13]Yi.Hu ,P.C .Loizou.,"Evaluation of objective Quality Measures for Speech Enhancement " ,IEEE Transactions on Audio, Speech and Language Processing pp.229-238,Jan.2008.
[14]ITU_T Rec, “Perceptual evaluation of speech quality(PESQ), An objective method for end to end speech quality assessment of narrowband telephone networks and speech codecs”., International Telecommunications Union ,Geneva Switzerland, February 2001.
[15]A Noisy Speech Corpus for Assessment of Speech Enhancement Algorithms. https: // ecs.utdallas.edu/ Loizou /speech/noizeous.
[16]DL.Donoho, “De-noising by soft thresholding “,IEEE Trans.Inform.Theory,41(3), 613627,1995.
[17]Boll,S.F, “Suppression of acoustic noise in speech using spectral subtraction”. IEEE Transactions on Acoustics Speech and Signal Processing, 1979,27(2), 113–120.
[18]Kamath S., Loizou P., “A multiband spectral subtraction method for enhancing speech corrupted by colored noise”, Proc .IEEE Intl. Conf. Acoustics, Speech, Signal Processing, 2002.
[19]Berouti, M. Schwartz, R., Makhoul, J., "Enhancement of Speech Corrupted by Acoustic Noise”, Proc ICASSP 1979, pp.208-211
[20]Emphraim,Y. andMalah,D. ‘Speech enhancement using a minimum mean square error short time spectral amplitude estimator,IEEE Trans. Acoustics, Speech and Signal Processing, 1984 ,32(6)1109-1121.
[21]Virag,.N, Single channel speech enhancement based on masking properties of the human auditory system, IEEE Trans. Speech Audio Processing,1997(3),126-137.
[22]Sovka,P.Pollak, P.,Kybic,J. Extended spectral subtraction, proceedings on European conference on Signal Processing Communication, 1996, Trieste,Italy ,pp.963-966.
[23]He,C.and Zweig,G. “Adaptive two –band spectral subtraction with multi window spectral subtraction”, proceedings on IEEE Conference on Acoustics, Speech and Signal Processing, 1999, Phoenix, AZ,pp.793-796.
[24]Gustafsson,H., Nordholm,S. and Claesson,I. ”Spectral Subtraction using Reduced delay convolution and adaptive averaging”,2001, IEEE Trans. Speech Audio Processing,9(8),799-807.
[25]Lockwood,P.and Boudy,.J, “Experiments with a non-linear spectral subtractor(NSS),Hidden Markov Models and the projections, for roboust recognition in cars, Speech Communication11(2-3),215-228.