Understanding the intelligibility of speech after noise reduction : a comparison of three predictive models

old_uid7586
titleUnderstanding the intelligibility of speech after noise reduction : a comparison of three predictive models
start_date2009/11/13
schedule10h-12h
onlineno
location_infoamphi M. Bloch
detailsdans le cadre des séminaires Exacola du Groupe SpiN
summaryUnderstanding the intelligibility of speech after noise reduction: a comparison of three predictive models. Hilkhuysen, Gaston and Huckvale, Mark. Centre for Law Enforcement Audio Research (CLEAR), Department of Speech, Hearing and Phonetic Sciences, University College London; G.Hilkhuysen@ucl.ac.uk With an ever increasing computational power available in audio devices such as hearing aids and mobile phones, noise reduction algorithms are becoming more widely used. But although noise reduction gives an apparent improvement in speech quality, typically these algorithms have been found to decrease intelligibility (e.g., Hu and Loizou, 2007). Furthermore, the auditory and perceptual mechanisms responsible for the deterioration in intelligibility are poorly understood, which limits our ability to improve existing algorithms or design new ones that might improve intelligibility. This paper investigates three predictive models of intelligibility on three noise reduction algorithms to attempt to shed light on this problem. Intelligibility measurements were obtained from speech in two types of noise (car and babble) processed by three different noise reduction algorithms (spectral subtraction, minimal mean square error and subspace analysis). All noise reduction algorithms reduced intelligibility to some degree, with the damage to intelligibility of a specific algorithm depending on the speech-to-noise ratio (SNR) in the original signal. The intelligibility scores for the processed signals were then modeled using three speech intelligibility models: the Speech Intelligibility Index (SII) (ANSI, 1997), the Coherence Speech Intelligibility Index based on the mid part of the dynamic range in speech (CSIImid) (Kates and Arehart, 2005), and the modulation SII (SIImod). The latter is an adaptation of the SII, based on the concept of signal-to-noise ratio in the modulation domain as proposed by Dubbelboer and Houtgast (2008). The models varied in their ability to predict the effect of noise reduction. Each noise-reduction algorithm removed more energy from the parts of the signal with the lowest SNR, thereby improving the overall SNR; therefore it is not surprising that the SII calculations falsely predict an increase in intelligibility for all types of noise reduction. CSIImid gave correct predictions in some cases. At some SNRs of babble noise, noise reduction decreased the coherence between the original speech and noisy speech, and for these conditions CSIImid predictions correctly indicate reductions in intelligibility. But for speech in car noise, CSIImid incorrectly predicts that noise reduction improves intelligibility. Only the predictions based on SIImod correctly showed decreasing or stable intelligibility scores after noise reduction. These findings suggest that additional temporal modulations introduced by noise reduction could be a factor responsible for the observed drop in intelligibility. However, neither SII, CSIImid nor SIImod could account for the SNR dependent effects of noise reduction as observed in the data. Additionally, the promising results of SIImod could only be obtained while relying more heavily on traditional SII calculations than suggested by Dubbelboer and Houtgast (2008). We aim to explore the SIImod in subsequent listening experiments by actively manipulating the signal-to-noise ratio in the modulation domain (by varying the noise reduction parameter settings) to further investigate its merits. Acknowledgements The CLEAR project is supported by the Home Office. References American National Standards Institute. 1998. American National Standard Methods for the calculation of the speech intelligibility index. ANSI S3.7-1998. New York: ANSI. Dubbelboer, F. and Houtgast, T. 2008. The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. J Acoust Soc Am, 124, 3937-46. Hu, Y. and Loizou, P.C. 2007. A comparative intelligibility study of single-microphone noise reduction algorithms. J Acoust Soc Am, 122, 1777-86 Kates, J.M. and Arehart, K.H. 2005. Coherence and the speech intelligibility index. J Acoust Soc Am, 117, 2224-37.
responsiblesKern