RESUMO
Neurophysiology research has demonstrated that it is possible and valuable to investigate sensory processing in scenarios involving continuous sensory streams, such as speech and music. Over the past 10 years or so, novel analytic frameworks combined with the growing participation in data sharing has led to a surge of publicly available datasets involving continuous sensory experiments. However, open science efforts in this domain of research remain scattered, lacking a cohesive set of guidelines. This paper presents an end-to-end open science framework for the storage, analysis, sharing, and re-analysis of neural data recorded during continuous sensory experiments. The framework has been designed to interface easily with existing toolboxes, such as EelBrain, NapLib, MNE, and the mTRF-Toolbox. We present guidelines by taking both the user view (how to rapidly re-analyse existing data) and the experimenter view (how to store, analyse, and share), making the process as straightforward and accessible as possible for all users. Additionally, we introduce a web-based data browser that enables the effortless replication of published results and data re-analysis.
RESUMO
Everyday environments often contain distracting competing talkers and background noise, requiring listeners to focus their attention on one acoustic source and reject others. During this auditory attention task, listeners may naturally interrupt their sustained attention and switch attended sources. The effort required to perform this attention switch has not been well studied in the context of competing continuous speech. In this work, we developed two variants of endogenous attention switching and a sustained attention control. We characterized these three experimental conditions under the context of decoding auditory attention, while simultaneously evaluating listening effort and neural markers of spatial-audio cues. A least-squares, electroencephalography (EEG)-based, attention decoding algorithm was implemented across all conditions. It achieved an accuracy of 69.4% and 64.0% when computed over nonoverlapping 10 and 5-s correlation windows, respectively. Both decoders illustrated smooth transitions in the attended talker prediction through switches at approximately half of the analysis window size (e.g., the mean lag taken across the two switch conditions was 2.2 s when the 5-s correlation window was used). Expended listening effort, as measured by simultaneous EEG and pupillometry, was also a strong indicator of whether the listeners sustained attention or performed an endogenous attention switch (peak pupil diameter measure [ p=0.034 ] and minimum parietal alpha power measure [ p=0.016 ]). We additionally found evidence of talker spatial cues in the form of centrotemporal alpha power lateralization ( p=0.0428 ). These results suggest that listener effort and spatial cues may be promising features to pursue in a decoding context, in addition to speech-based features.
Assuntos
Percepção da Fala , Estimulação Acústica/métodos , Atenção , Eletroencefalografia , Esforço de Escuta , PupilaRESUMO
Computational models of animal biosonar seek to identify critical aspects of echo processing responsible for the superior, real-time performance of echolocating bats and dolphins in target tracking and clutter rejection. The Spectrogram Correlation and Transformation (SCAT) model replicates aspects of biosonar imaging in both species by processing wideband biosonar sounds and echoes with auditory mechanisms identified from experiments with bats. The model acquires broadband biosonar broadcasts and echoes, represents them as time-frequency spectrograms using parallel bandpass filters, translates the filtered signals into ten parallel amplitude threshold levels, and then operates on the resulting time-of-occurrence values at each frequency to estimate overall echo range delay. It uses the structure of the echo spectrum by depicting it as a series of local frequency nulls arranged regularly along the frequency axis of the spectrograms after dechirping them relative to the broadcast. Computations take place entirely on the timing of threshold-crossing events for each echo relative to threshold-events for the broadcast. Threshold-crossing times take into account amplitude-latency trading, a physiological feature absent from conventional digital signal processing. Amplitude-latency trading transposes the profile of amplitudes across frequencies into a profile of time-registrations across frequencies. Target shape is extracted from the spacing of the object's individual acoustic reflecting points, or glints, using the mutual interference pattern of peaks and nulls in the echo spectrum. These are merged with the overall range-delay estimate to produce a delay-based reconstruction of the object's distance as well as its glints. Clutter echoes indiscriminately activate multiple parts in the null-detecting system, which then produces the equivalent glint-delay spacings in images, thus blurring the overall echo-delay estimates by adding spurious glint delays to the image. Blurring acts as an anticorrelation process that rejects clutter intrusion into perceptions.
Assuntos
Percepção Auditiva/fisiologia , Quirópteros/fisiologia , Golfinhos/fisiologia , Ecolocação/fisiologia , Som , Algoritmos , Animais , Simulação por Computador , Luz , Processamento de Sinais Assistido por Computador , SoftwareRESUMO
Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at -20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at -22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.
RESUMO
Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.