Low-dimensional interference of mid-level sound statistics predicts human speech recognition in natural environmental noise.

Clonan, Alex C; Zhai, Xiu; Stevenson, Ian H; Escabí, Monty A

Clonan, Alex C; Zhai, Xiu; Stevenson, Ian H; Escabí, Monty A.

Afiliação

Clonan AC; Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269.
Zhai X; Biomedical Engineering, University of Connecticut, Storrs, CT 06269.
Stevenson IH; Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269.
Escabí MA; Biomedical Engineering, Wentworth Institute of Technology, Boston, MA 02115.

bioRxiv ; 2024 Feb 14.

Article em En | MEDLINE | ID: mdl-38405870

ABSTRACT

ABSTRACT

Recognizing speech in noise, such as in a busy street or restaurant, is an essential listening task where the task difficulty varies across acoustic environments and noise levels. Yet, current cognitive models are unable to account for changing real-world hearing sensitivity. Here, using natural and perturbed background sounds we demonstrate that spectrum and modulations statistics of environmental backgrounds drastically impact human word recognition accuracy and they do so independently of the noise level. These sound statistics can facilitate or hinder recognition - at the same noise level accuracy can range from 0% to 100%, depending on the background. To explain this perceptual variability, we optimized a biologically grounded hierarchical model, consisting of frequency-tuned cochlear filters and subsequent mid-level modulation-tuned filters that account for central auditory tuning. Low-dimensional summary statistics from the mid-level model accurately predict single trial perceptual judgments, accounting for more than 90% of the perceptual variance across backgrounds and noise levels, and substantially outperforming a cochlear model. Furthermore, perceptual transfer functions in the mid-level auditory space identify multi-dimensional natural sound features that impact recognition. Thus speech recognition in natural backgrounds involves interference of multiple summary statistics that are well described by an interpretable, low-dimensional auditory model. Since this framework relates salient natural sound cues to single trial perceptual judgements, it may improve outcomes for auditory prosthetics and clinical measurements of real-world hearing sensitivity.

Palavras-chave

behavioral model; cocktail party; energetic masking; modulation masking; natural sounds; neural model; psychoacoustics; sound statistics; speech in noise; texture statistics

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article