Using deep learning to improve the intelligibility of a target speaker in noisy multi-talker environments for people with normal hearing and hearing loss.

Thoidis, Iordanis; Goehring, Tobias

Thoidis, Iordanis; Goehring, Tobias.

Afiliación

Thoidis I; School of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece.
Goehring T; Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom.

J Acoust Soc Am ; 156(1): 706-724, 2024 Jul 01.

Article en En | MEDLINE | ID: mdl-39082692

ABSTRACT

ABSTRACT

Understanding speech in noisy environments is a challenging task, especially in communication situations with several competing speakers. Despite their ongoing improvement, assistive listening devices and speech processing approaches still do not perform well enough in noisy multi-talker environments, as they may fail to restore the intelligibility of a speaker of interest among competing sound sources. In this study, a quasi-causal deep learning algorithm was developed that can extract the voice of a target speaker, as indicated by a short enrollment utterance, from a mixture of multiple concurrent speakers in background noise. Objective evaluation with computational metrics demonstrated that the speaker-informed algorithm successfully extracts the target speaker from noisy multi-talker mixtures. This was achieved using a single algorithm that generalized to unseen speakers, different numbers of speakers and relative speaker levels, and different speech corpora. Double-blind sentence recognition tests on mixtures of one, two, and three speakers in restaurant noise were conducted with listeners with normal hearing and listeners with hearing loss. Results indicated significant intelligibility improvements with the speaker-informed algorithm of 17% and 31% for people without and with hearing loss, respectively. In conclusion, it was demonstrated that deep learning-based speaker extraction can enhance speech intelligibility in noisy multi-talker environments where uninformed speech enhancement methods fail.

Asunto(s)

Aprendizaje Profundo; Ruido; Inteligibilidad del Habla; Percepción del Habla; Humanos; Ruido/efectos adversos; Femenino; Masculino; Adulto; Persona de Mediana Edad; Pérdida Auditiva/fisiopatología; Pérdida Auditiva/psicología; Adulto Joven; Anciano; Algoritmos; Audición; Enmascaramiento Perceptual

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Inteligibilidad del Habla / Percepción del Habla / Aprendizaje Profundo / Ruido Límite: Adult / Aged / Female / Humans / Male / Middle aged Idioma: En Revista: J Acoust Soc Am Año: 2024 Tipo del documento: Article País de afiliación: Grecia

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google