Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Restoring speech intelligibility for hearing aid users with deep learning.

Diehl, Peter Udo; Singer, Yosef; Zilly, Hannes; Schönfeld, Uwe; Meyer-Rachner, Paul; Berry, Mark; Sprekeler, Henning; Sprengel, Elias; Pudszuhn, Annett; Hofmann, Veit M.

Sci Rep ; 13(1): 2719, 2023 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-36792797

RESUMO

Almost half a billion people world-wide suffer from disabling hearing loss. While hearing aids can partially compensate for this, a large proportion of users struggle to understand speech in situations with background noise. Here, we present a deep learning-based algorithm that selectively suppresses noise while maintaining speech signals. The algorithm restores speech intelligibility for hearing aid users to the level of control subjects with normal hearing. It consists of a deep network that is trained on a large custom database of noisy speech signals and is further optimized by a neural architecture search, using a novel deep learning-based metric for speech intelligibility. The network achieves state-of-the-art denoising on a range of human-graded assessments, generalizes across different noise categories and-in contrast to classic beamforming approaches-operates on a single microphone. The system runs in real time on a laptop, suggesting that large-scale deployment on hearing aid chips could be achieved within a few years. Deep learning-based denoising therefore holds the potential to improve the quality of life of millions of hearing impaired people soon.

Assuntos

Aprendizado Profundo , Auxiliares de Audição , Perda Auditiva Neurossensorial , Percepção da Fala , Humanos , Inteligibilidade da Fala , Qualidade de Vida

Non-intrusive deep learning-based computational speech metrics with high-accuracy across a wide range of acoustic scenes.

Diehl, Peter Udo; Thorbergsson, Leifur; Singer, Yosef; Skripniuk, Vladislav; Pudszuhn, Annett; Hofmann, Veit M; Sprengel, Elias; Meyer-Rachner, Paul.

PLoS One ; 17(11): e0278170, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36441711

RESUMO

Speech with high sound quality and little noise is central to many of our communication tools, including calls, video conferencing and hearing aids. While human ratings provide the best measure of sound quality, they are costly and time-intensive to gather, thus computational metrics are typically used instead. Here we present a non-intrusive, deep learning-based metric that takes only a sound sample as an input and returns ratings in three categories: overall quality, noise, and sound quality. This metric is available via a web API and is composed of a deep neural network ensemble with 5 networks that use either ResNet-26 architectures with STFT inputs or fully-connected networks with wav2vec features as inputs. The networks are trained and tested on over 1 million crowd-sourced human sound ratings across the three categories. Correlations of our metric with human ratings exceed or match other state-of-the-art metrics on 51 out of 56 benchmark scenes, while not requiring clean speech reference samples as opposed to metrics that are performing well on the other 5 scenes. The benchmark scenes represent a wide variety of acoustic environments and a large selection of post-processing methods that include classical methods (e.g. Wiener-filtering) and newer deep-learning methods.

Assuntos

Aprendizado Profundo , Fala , Humanos , Benchmarking , Acústica , Som

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA