Search | VHL TCIM Americas

Neural encoding of musical expectations in a non-human primate.

Bianco, Roberta; Zuk, Nathaniel J; Bigand, Félix; Quarta, Eros; Grasso, Stefano; Arnese, Flavia; Ravignani, Andrea; Battaglia-Mayer, Alexandra; Novembre, Giacomo.

Curr Biol ; 34(2): 444-450.e5, 2024 01 22.

Article in English | MEDLINE | ID: mdl-38176416

ABSTRACT

The appreciation of music is a universal trait of humankind.1,2,3 Evidence supporting this notion includes the ubiquity of music across cultures4,5,6,7 and the natural predisposition toward music that humans display early in development.8,9,10 Are we musical animals because of species-specific predispositions? This question cannot be answered by relying on cross-cultural or developmental studies alone, as these cannot rule out enculturation.11 Instead, it calls for cross-species experiments testing whether homologous neural mechanisms underlying music perception are present in non-human primates. We present music to two rhesus monkeys, reared without musical exposure, while recording electroencephalography (EEG) and pupillometry. Monkeys exhibit higher engagement and neural encoding of expectations based on the previously seeded musical context when passively listening to real music as opposed to shuffled controls. We then compare human and monkey neural responses to the same stimuli and find a species-dependent contribution of two fundamental musical features-pitch and timing12-in generating expectations: while timing- and pitch-based expectations13 are similarly weighted in humans, monkeys rely on timing rather than pitch. Together, these results shed light on the phylogeny of music perception. They highlight monkeys' capacity for processing temporal structures beyond plain acoustic processing, and they identify a species-dependent contribution of time- and pitch-related features to the neural encoding of musical expectations.

Subject(s)

Music , Animals , Pitch Perception/physiology , Motivation , Electroencephalography/methods , Primates , Acoustic Stimulation , Auditory Perception/physiology

Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension.

Zhang, Xinmiao; Li, Jiawei; Li, Zhuoran; Hong, Bo; Diao, Tongxiang; Ma, Xin; Nolte, Guido; Engel, Andreas K; Zhang, Dan.

Neuroimage ; 282: 120404, 2023 11 15.

Article in English | MEDLINE | ID: mdl-37806465

ABSTRACT

Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.

Subject(s)

Comprehension , Speech Perception , Humans , Comprehension/physiology , Semantics , Speech/physiology , Speech Perception/physiology , Electroencephalography , Acoustics , Acoustic Stimulation

Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech.

Lindboom, Elsa; Nidiffer, Aaron; Carney, Laurel H; Lalor, Edmund C.

Hear Res ; 433: 108767, 2023 06.

Article in English | MEDLINE | ID: mdl-37060895

ABSTRACT

The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.

Subject(s)

Auditory Cortex , Inferior Colliculi , Humans , Speech/physiology , Auditory Pathways/physiology , Electroencephalography , Auditory Cortex/physiology , Inferior Colliculi/physiology , Acoustic Stimulation/methods , Auditory Perception/physiology

EEG-based auditory attention decoding using speech-level-based segmented computational models.

Wang, Lei; Wu, Ed X; Chen, Fei.

J Neural Eng ; 18(4)2021 05 25.

Article in English | MEDLINE | ID: mdl-33957606

ABSTRACT

Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.

Subject(s)

Speech Perception , Speech , Acoustic Stimulation , Attention , Computer Simulation , Electroencephalography , Humans

Prior Knowledge Guides Speech Segregation in Human Auditory Cortex.

Wang, Yuanye; Zhang, Jianfeng; Zou, Jiajie; Luo, Huan; Ding, Nai.

Cereb Cortex ; 29(4): 1561-1571, 2019 04 01.

Article in English | MEDLINE | ID: mdl-29788144

ABSTRACT

Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.

Subject(s)

Auditory Cortex/physiology , Cues , Speech Perception/physiology , Acoustic Stimulation , Adolescent , Adult , Attention , Female , Humans , Magnetoencephalography , Male , Speech Acoustics , Young Adult

Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions.

Crosse, Michael J; Butler, John S; Lalor, Edmund C.

J Neurosci ; 35(42): 14195-204, 2015 Oct 21.

Article in English | MEDLINE | ID: mdl-26490860

ABSTRACT

Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener's perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2-6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually. SIGNIFICANCE STATEMENT: Seeing a speaker's face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker's facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to combine information from continuous auditory and visual speech has traditionally been methodologically difficult. Here we introduce a new approach for doing this using relatively inexpensive and noninvasive scalp recordings. Specifically, we show that the brain's representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing. Furthermore, we show that this enhancement is most pronounced at a time scale that corresponds to mean syllable length.

Subject(s)

Evoked Potentials, Auditory/physiology , Evoked Potentials, Visual/physiology , Speech Perception/physiology , Visual Perception/physiology , Acoustic Stimulation , Adult , Analysis of Variance , Brain Mapping , Electroencephalography , Electromyography , Female , Humans , Male , Photic Stimulation , Reaction Time , Young Adult

Evidence for Neural Computations of Temporal Coherence in an Auditory Scene and Their Enhancement during Active Listening.

O'Sullivan, James A; Shamma, Shihab A; Lalor, Edmund C.

J Neurosci ; 35(18): 7256-63, 2015 May 06.

Article in English | MEDLINE | ID: mdl-25948273

ABSTRACT

The human brain has evolved to operate effectively in highly complex acoustic environments, segregating multiple sound sources into perceptually distinct auditory objects. A recent theory seeks to explain this ability by arguing that stream segregation occurs primarily due to the temporal coherence of the neural populations that encode the various features of an individual acoustic source. This theory has received support from both psychoacoustic and functional magnetic resonance imaging (fMRI) studies that use stimuli which model complex acoustic environments. Termed stochastic figure-ground (SFG) stimuli, they are composed of a "figure" and background that overlap in spectrotemporal space, such that the only way to segregate the figure is by computing the coherence of its frequency components over time. Here, we extend these psychoacoustic and fMRI findings by using the greater temporal resolution of electroencephalography to investigate the neural computation of temporal coherence. We present subjects with modified SFG stimuli wherein the temporal coherence of the figure is modulated stochastically over time, which allows us to use linear regression methods to extract a signature of the neural processing of this temporal coherence. We do this under both active and passive listening conditions. Our findings show an early effect of coherence during passive listening, lasting from â¼115 to 185 ms post-stimulus. When subjects are actively listening to the stimuli, these responses are larger and last longer, up to â¼265 ms. These findings provide evidence for early and preattentive neural computations of temporal coherence that are enhanced by active analysis of an auditory scene.

Subject(s)

Acoustic Stimulation/methods , Auditory Pathways/physiology , Auditory Perception/physiology , Brain Mapping/methods , Psychoacoustics , Adult , Electroencephalography/methods , Female , Humans , Magnetic Resonance Imaging/methods , Male , Time Factors , Young Adult

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL