Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters










Publication year range
1.
Sci Rep ; 14(1): 19105, 2024 08 17.
Article in English | MEDLINE | ID: mdl-39154048

ABSTRACT

The multivariate temporal response function (mTRF) is an effective tool for investigating the neural encoding of acoustic and complex linguistic features in natural continuous speech. In this study, we investigated how neural representations of speech features derived from natural stimuli are related to early signs of cognitive decline in older adults, taking into account the effects of hearing. Participants without ( n = 25 ) and with ( n = 19 ) early signs of cognitive decline listened to an audiobook while their electroencephalography responses were recorded. Using the mTRF framework, we modeled the relationship between speech input and neural response via different acoustic, segmented and linguistic encoding models and examined the response functions in terms of encoding accuracy, signal power, peak amplitudes and latencies. Our results showed no significant effect of cognitive decline or hearing ability on the neural encoding of acoustic and linguistic speech features. However, we found a significant interaction between hearing ability and the word-level segmentation model, suggesting that hearing impairment specifically affects encoding accuracy for this model, while other features were not affected by hearing ability. These results suggest that while speech processing markers remain unaffected by cognitive decline and hearing loss per se, neural encoding of word-level segmented speech features in older adults is affected by hearing loss but not by cognitive decline. This study emphasises the effectiveness of mTRF analysis in studying the neural encoding of speech and argues for an extension of research to investigate its clinical impact on hearing loss and cognition.


Subject(s)
Cognitive Dysfunction , Electroencephalography , Hearing Loss , Speech Perception , Humans , Male , Female , Aged , Cognitive Dysfunction/physiopathology , Hearing Loss/physiopathology , Speech Perception/physiology , Speech/physiology , Middle Aged , Cues , Linguistics , Acoustic Stimulation , Aged, 80 and over
2.
Front Hum Neurosci ; 17: 1253211, 2023.
Article in English | MEDLINE | ID: mdl-37727862

ABSTRACT

Introduction: Speech production involves neurological planning and articulatory execution. How speakers prepare for articulation is a significant aspect of speech production research. Previous studies have focused on isolated words or short phrases to explore speech planning mechanisms linked to articulatory behaviors, including investigating the eye-voice span (EVS) during text reading. However, these experimental paradigms lack real-world speech process replication. Additionally, our understanding of the neurological dimension of speech planning remains limited. Methods: This study examines speech planning mechanisms during continuous speech production by analyzing behavioral (eye movement and speech) and neurophysiological (EEG) data within a continuous speech production task. The study specifically investigates the influence of semantic consistency on speech planning and the occurrence of "look ahead" behavior. Results: The outcomes reveal the pivotal role of semantic coherence in facilitating fluent speech production. Speakers access lexical representations and phonological information before initiating speech, emphasizing the significance of semantic processing in speech planning. Behaviorally, the EVS decreases progressively during continuous reading of regular sentences, with a slight increase for non-regular sentences. Moreover, eye movement pattern analysis identifies two distinct speech production modes, highlighting the importance of semantic comprehension and prediction in higher-level lexical processing. Neurologically, the dual pathway model of speech production is supported, indicating a dorsal information flow and frontal lobe involvement. The brain network linked to semantic understanding exhibits a negative correlation with semantic coherence, with significant activation during semantic incoherence and suppression in regular sentences. Discussion: The study's findings enhance comprehension of speech planning mechanisms and offer insights into the role of semantic coherence in continuous speech production. Furthermore, the research methodology establishes a valuable framework for future investigations in this domain.

3.
Brain Sci ; 13(7)2023 Jul 17.
Article in English | MEDLINE | ID: mdl-37509014

ABSTRACT

Background noise elicits listening effort. What else is tinnitus if not an endogenous background noise? From such reasoning, we hypothesized the occurrence of increased listening effort in tinnitus patients during listening tasks. Such a hypothesis was tested by investigating some indices of listening effort through electroencephalographic and skin conductance, particularly parietal and frontal alpha and electrodermal activity (EDA). Furthermore, tinnitus distress questionnaires (THI and TQ12-I) were employed. Parietal alpha values were positively correlated to TQ12-I scores, and both were negatively correlated to EDA; Pre-stimulus frontal alpha correlated with the THI score in our pilot study; finally, results showed a general trend of increased frontal alpha activity in the tinnitus group in comparison to the control group. Parietal alpha during the listening to stimuli, positively correlated to the TQ12-I, appears to reflect a higher listening effort in tinnitus patients and the perception of tinnitus symptoms. The negative correlation between both listening effort (parietal alpha) and tinnitus symptoms perception (TQ12-I scores) with EDA levels could be explained by a less responsive sympathetic nervous system to prepare the body to expend increased energy during the "fight or flight" response, due to pauperization of energy from tinnitus perception.

4.
Bioengineering (Basel) ; 10(5)2023 Apr 26.
Article in English | MEDLINE | ID: mdl-37237601

ABSTRACT

Parkinson's disease is a progressive neurodegenerative disorder caused by dopaminergic neuron degeneration. Parkinsonian speech impairment is one of the earliest presentations of the disease and, along with tremor, is suitable for pre-diagnosis. It is defined by hypokinetic dysarthria and accounts for respiratory, phonatory, articulatory, and prosodic manifestations. The topic of this article targets artificial-intelligence-based identification of Parkinson's disease from continuous speech recorded in a noisy environment. The novelty of this work is twofold. First, the proposed assessment workflow performed speech analysis on samples of continuous speech. Second, we analyzed and quantified Wiener filter applicability for speech denoising in the context of Parkinsonian speech identification. We argue that the Parkinsonian features of loudness, intonation, phonation, prosody, and articulation are contained in the speech, speech energy, and Mel spectrograms. Thus, the proposed workflow follows a feature-based speech assessment to determine the feature variation ranges, followed by speech classification using convolutional neural networks. We report the best classification accuracies of 96% on speech energy, 93% on speech, and 92% on Mel spectrograms. We conclude that the Wiener filter improves both feature-based analysis and convolutional-neural-network-based classification performances.

5.
Neurobiol Lang (Camb) ; 4(1): 29-52, 2023.
Article in English | MEDLINE | ID: mdl-37229141

ABSTRACT

Partial speech input is often understood to trigger rapid and automatic activation of successively higher-level representations of words, from sound to meaning. Here we show evidence from magnetoencephalography that this type of incremental processing is limited when words are heard in isolation as compared to continuous speech. This suggests a less unified and automatic word recognition process than is often assumed. We present evidence from isolated words that neural effects of phoneme probability, quantified by phoneme surprisal, are significantly stronger than (statistically null) effects of phoneme-by-phoneme lexical uncertainty, quantified by cohort entropy. In contrast, we find robust effects of both cohort entropy and phoneme surprisal during perception of connected speech, with a significant interaction between the contexts. This dissociation rules out models of word recognition in which phoneme surprisal and cohort entropy are common indicators of a uniform process, even though these closely related information-theoretic measures both arise from the probability distribution of wordforms consistent with the input. We propose that phoneme surprisal effects reflect automatic access of a lower level of representation of the auditory input (e.g., wordforms) while the occurrence of cohort entropy effects is task sensitive, driven by a competition process or a higher-level representation that is engaged late (or not at all) during the processing of single words.

6.
Neurobiol Lang (Camb) ; 4(2): 318-343, 2023.
Article in English | MEDLINE | ID: mdl-37229509

ABSTRACT

Speech processing often occurs amid competing inputs from other modalities, for example, listening to the radio while driving. We examined the extent to which dividing attention between auditory and visual modalities (bimodal divided attention) impacts neural processing of natural continuous speech from acoustic to linguistic levels of representation. We recorded electroencephalographic (EEG) responses when human participants performed a challenging primary visual task, imposing low or high cognitive load while listening to audiobook stories as a secondary task. The two dual-task conditions were contrasted with an auditory single-task condition in which participants attended to stories while ignoring visual stimuli. Behaviorally, the high load dual-task condition was associated with lower speech comprehension accuracy relative to the other two conditions. We fitted multivariate temporal response function encoding models to predict EEG responses from acoustic and linguistic speech features at different representation levels, including auditory spectrograms and information-theoretic models of sublexical-, word-form-, and sentence-level representations. Neural tracking of most acoustic and linguistic features remained unchanged with increasing dual-task load, despite unambiguous behavioral and neural evidence of the high load dual-task condition being more demanding. Compared to the auditory single-task condition, dual-task conditions selectively reduced neural tracking of only some acoustic and linguistic features, mainly at latencies >200 ms, while earlier latencies were surprisingly unaffected. These findings indicate that behavioral effects of bimodal divided attention on continuous speech processing occur not because of impaired early sensory representations but likely at later cognitive processing stages. Crossmodal attention-related mechanisms may not be uniform across different speech processing levels.

7.
Int J Audiol ; 62(3): 199-208, 2023 03.
Article in English | MEDLINE | ID: mdl-35152811

ABSTRACT

OBJECTIVE: To explore the detection of cortical responses to continuous speech using a single EEG channel. Particularly, to compare detection rates and times using a cross-correlation approach and parameters extracted from the temporal response function (TRF). DESIGN: EEG from 32-channels were recorded whilst presenting 25-min continuous English speech. Detection parameters were cross-correlation between speech and EEG (XCOR), peak value and power of the TRF filter (TRF-peak and TRF-power), and correlation between predicted TRF and true EEG (TRF-COR). A bootstrap analysis was used to determine response statistical significance. Different electrode configurations were compared: Using single channels Cz or Fz, or selecting channels with the highest correlation value. STUDY SAMPLE: Seventeen native English-speaking subjects with mild-to-moderate hearing loss. RESULTS: Significant cortical responses were detected from all subjects at Fz channel with XCOR and TRF-COR. Lower detection time was seen for XCOR (mean = 4.8 min) over TRF parameters (best TRF-COR, mean = 6.4 min), with significant time differences from XCOR to TRF-peak and TRF-power. Analysing multiple EEG channels and testing channels with the highest correlation between envelope and EEG reduced detection sensitivity compared to Fz alone. CONCLUSIONS: Cortical responses to continuous speech can be detected from a single channel with recording times that may be suitable for clinical application.


Subject(s)
Hearing Loss , Speech Perception , Humans , Electroencephalography , Speech , Speech Perception/physiology
8.
Front Neurosci ; 16: 906616, 2022.
Article in English | MEDLINE | ID: mdl-36061597

ABSTRACT

Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user's speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four-channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models' informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.

9.
Front Hum Neurosci ; 16: 894676, 2022.
Article in English | MEDLINE | ID: mdl-35937674

ABSTRACT

Previous neuroimaging investigations of overt speech production in adults who stutter (AWS) found increased motor and decreased auditory activity compared to controls. Activity in the auditory cortex is heightened, however, under fluency-inducing conditions in which AWS temporarily become fluent while synchronizing their speech with an external rhythm, such as a metronome or another speaker. These findings suggest that stuttering is associated with disrupted auditory motor integration. Technical challenges in acquiring neuroimaging data during continuous overt speech production have limited experimental paradigms to short or covert speech tasks. Such paradigms are not ideal, as stuttering primarily occurs during longer speaking tasks. To address this gap, we used a validated spatial ICA technique designed to address speech movement artifacts during functional magnetic resonance imaging (fMRI) scanning. We compared brain activity and functional connectivity of the left auditory cortex during continuous speech production in two conditions: solo (stutter-prone) and choral (fluency-inducing) reading tasks. Overall, brain activity differences in AWS relative to controls in the two conditions were similar, showing expected patterns of hyperactivity in premotor/motor regions but underactivity in auditory regions. Functional connectivity of the left auditory cortex (STG) showed that within the AWS group there was increased correlated activity with the right insula and inferior frontal area during choral speech. The AWS also exhibited heightened connectivity between left STG and key regions of the default mode network (DMN) during solo speech. These findings indicate possible interference by the DMN during natural, stuttering-prone speech in AWS, and that enhanced coordination between auditory and motor regions may support fluent speech.

10.
Brain Lang ; 230: 105128, 2022 07.
Article in English | MEDLINE | ID: mdl-35537247

ABSTRACT

Listeners regularly comprehend continuous speech despite noisy conditions. Previous studies show that neural tracking of speech degrades under noise, predicts comprehension, and increases for non-native listeners. We test the hypothesis that listeners similarly increase tracking for both L2 and noisy L1 speech, after adjusting for comprehension. Twenty-four Chinese-English bilinguals underwent EEG while listening to one hour of an audiobook, mixed with three levels of noise, in Mandarin and English and answered comprehension questions. We estimated tracking of the speech envelope in EEG for each one-minute segment using the multivariate temporal response function (mTRF). Contrary to our prediction, L2 tracking was significantly lower than L1, while L1 tracking significantly increased with noise maskers without reducing comprehension. However, greater L2 proficiency was positively associated with greater L2 tracking. We discuss how studies of speech envelope tracking using noise and bilingualism might be reconciled through a focus on exerted rather than demanded effort.


Subject(s)
Multilingualism , Speech Perception , Humans , Language , Noise , Speech , Speech Perception/physiology
11.
Neurosci Biobehav Rev ; 133: 104506, 2022 02.
Article in English | MEDLINE | ID: mdl-34942267

ABSTRACT

BACKGROUND: Cortical entrainment has emerged as a promising means for measuring continuous speech processing in young, neurotypical adults. However, its utility for capturing atypical speech processing has not been systematically reviewed. OBJECTIVES: Synthesize evidence regarding the merit of measuring cortical entrainment to capture atypical speech processing and recommend avenues for future research. METHOD: We systematically reviewed publications investigating entrainment to continuous speech in populations with auditory processing differences. RESULTS: In the 25 publications reviewed, most studies were conducted on older and/or hearing-impaired adults, for whom slow-wave entrainment to speech was often heightened compared to controls. Research conducted on populations with neurodevelopmental disorders, in whom slow-wave entrainment was often reduced, was less common. Across publications, findings highlighted associations between cortical entrainment and speech processing performance differences. CONCLUSIONS: Measures of cortical entrainment offer a useful means of capturing speech processing differences and future research should leverage them more extensively when studying populations with neurodevelopmental disorders.


Subject(s)
Auditory Cortex , Speech Perception , Acoustic Stimulation , Adult , Auditory Perception , Humans , Speech
12.
Front Psychol ; 13: 1076339, 2022.
Article in English | MEDLINE | ID: mdl-36619132

ABSTRACT

Language is fundamentally predictable, both on a higher schematic level as well as low-level lexical items. Regarding predictability on a lexical level, collocations are frequent co-occurrences of words that are often characterized by high strength of association. So far, psycho- and neurolinguistic studies have mostly employed highly artificial experimental paradigms in the investigation of collocations by focusing on the processing of single words or isolated sentences. In contrast, here we analyze EEG brain responses recorded during stimulation with continuous speech, i.e., audio books. We find that the N400 response to collocations is significantly different from that of non-collocations, whereas the effect varies with respect to cortical region (anterior/posterior) and laterality (left/right). Our results are in line with studies using continuous speech, and they mostly contradict those using artificial paradigms and stimuli. To the best of our knowledge, this is the first neurolinguistic study on collocations using continuous speech stimulation.

13.
Front Neurosci ; 16: 963629, 2022.
Article in English | MEDLINE | ID: mdl-36711133

ABSTRACT

In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.

14.
Eur J Neurosci ; 55(11-12): 3288-3302, 2022 06.
Article in English | MEDLINE | ID: mdl-32687616

ABSTRACT

Making sense of a poor auditory signal can pose a challenge. Previous attempts to quantify speech intelligibility in neural terms have usually focused on one of two measures, namely low-frequency speech-brain synchronization or alpha power modulations. However, reports have been mixed concerning the modulation of these measures, an issue aggravated by the fact that they have normally been studied separately. We present two MEG studies analyzing both measures. In study 1, participants listened to unimodal auditory speech with three different levels of degradation (original, 7-channel and 3-channel vocoding). Intelligibility declined with declining clarity, but speech was still intelligible to some extent even for the lowest clarity level (3-channel vocoding). Low-frequency (1-7 Hz) speech tracking suggested a U-shaped relationship with strongest effects for the medium-degraded speech (7-channel) in bilateral auditory and left frontal regions. To follow up on this finding, we implemented three additional vocoding levels (5-channel, 2-channel and 1-channel) in a second MEG study. Using this wider range of degradation, the speech-brain synchronization showed a similar pattern as in study 1, but further showed that when speech becomes unintelligible, synchronization declines again. The relationship differed for alpha power, which continued to decrease across vocoding levels reaching a floor effect for 5-channel vocoding. Predicting subjective intelligibility based on models either combining both measures or each measure alone showed superiority of the combined model. Our findings underline that speech tracking and alpha power are modified differently by the degree of degradation of continuous speech but together contribute to the subjective speech understanding.


Subject(s)
Speech Perception , Brain , Brain Mapping , Humans , Speech Intelligibility
15.
Neuroimage ; 245: 118720, 2021 12 15.
Article in English | MEDLINE | ID: mdl-34774771

ABSTRACT

Accurate localization of brain regions responsible for language and cognitive functions in epilepsy patients is important. Electrocorticography (ECoG)-based real-time functional mapping (RTFM) has been shown to be a safer alternative to electrical cortical stimulation mapping (ESM), which is currently the clinical/gold standard. Conventional methods for analyzing RTFM data mostly account for the ECoG signal in certain frequency bands, especially high gamma. Compared to ESM, they have limited accuracy when assessing channel responses. In the present study, we developed a novel RTFM method based on tensor component analysis (TCA) to address the limitations of current estimation methods. Our approach analyzes the whole frequency spectrum of the ECoG signal during natural continuous speech. We construct third-order tensors that contain multichannel time-frequency information and use TCA to extract low-dimensional temporal, spectral and spatial modes. Temporal modulation scores (correlation values) are then calculated between the time series of voice envelope features and TCA-estimated temporal courses, and significant temporal modulation determines which components' channel weightings are displayed to the neurosurgeon as a guide for follow-up ESM. In our experiments, data from thirteen patients with refractory epilepsy were recorded during preoperative evaluation for their epileptogenic zones (EZs), which were located adjacent to the eloquent cortex. Our results showed higher detection accuracy of our proposed method in a narrative speech task, suggesting that our method complements ESM and is an improvement over the prior RTFM method. To our knowledge, this is the first TCA-based method to pinpoint language-specific brain regions during continuous speech that uses whole-band ECoG.


Subject(s)
Brain Mapping/methods , Craniotomy , Electrocorticography , Epilepsy/surgery , Speech/physiology , Wakefulness , Adolescent , Adult , Child , Child, Preschool , China , Female , Humans , Male
16.
J Neurophysiol ; 126(3): 791-802, 2021 09 01.
Article in English | MEDLINE | ID: mdl-34232756

ABSTRACT

Auditory processing is affected by advancing age and hearing loss, but the underlying mechanisms are still unclear. We investigated the effects of age and hearing loss on temporal processing of naturalistic stimuli in the auditory system. We used a recently developed objective measure for neural phase-locking to the fundamental frequency of the voice (f0) which uses continuous natural speech as a stimulus, that is, "f0-tracking." The f0-tracking responses from 54 normal-hearing and 14 hearing-impaired adults of varying ages were analyzed. The responses were evoked by a Flemish story with a male talker and contained contributions from both subcortical and cortical sources. Results indicated that advancing age was related to smaller responses with less cortical response contributions. This is consistent with an age-related decrease in neural phase-locking ability at frequencies in the range of the f0, possibly due to decreased inhibition in the auditory system. Conversely, hearing-impaired subjects displayed larger responses compared with age-matched normal-hearing controls. This was due to additional cortical response contributions in the 38- to 50-ms latency range, which were stronger for participants with more severe hearing loss. This is consistent with hearing-loss-induced cortical reorganization and recruitment of additional neural resources to aid in speech perception.NEW & NOTEWORTHY Previous studies disagree on the effects of age and hearing loss on the neurophysiological processing of the fundamental frequency of the voice (f0), in part due to confounding effects. Using a novel electrophysiological technique, natural speech stimuli, and controlled study design, we quantified and disentangled the effects of age and hearing loss on neural f0 processing. We uncovered evidence for underlying neurophysiological mechanisms, including a cortical compensation mechanism for hearing loss, but not for age.


Subject(s)
Adaptation, Physiological , Cerebral Cortex/physiology , Hearing Loss/physiopathology , Speech Acoustics , Speech Perception , Adolescent , Adult , Aged , Aged, 80 and over , Auditory Pathways/physiology , Auditory Pathways/physiopathology , Cerebral Cortex/cytology , Cerebral Cortex/growth & development , Cerebral Cortex/physiopathology , Evoked Potentials, Auditory , Female , Humans , Male , Middle Aged , Reaction Time
17.
Neuropsychologia ; 158: 107883, 2021 07 30.
Article in English | MEDLINE | ID: mdl-33989647

ABSTRACT

Pitch accents are local pitch patterns that convey differences in word prominence and modulate the information structure of the discourse. Despite the importance to discourse in languages like English, neural processing of pitch accents remains understudied. The current study investigates the neural processing of pitch accents by native and non-native English speakers while they are listening to or ignoring 45 min of continuous, natural speech. Leveraging an approach used to study phonemes in natural speech, we analyzed thousands of electroencephalography (EEG) segments time-locked to pitch accents in a prosodic transcription. The optimal neural discrimination between pitch accent categories emerged at latencies between 100 and 200 ms. During these latencies, we found a strong structural alignment between neural and phonetic representations of pitch accent categories. In the same latencies, native listeners exhibited more robust processing of pitch accent contrasts than non-native listeners. However, these group differences attenuated when the speech signal was ignored. We can reliably capture the neural processing of discrete and contrastive pitch accent categories in continuous speech. Our analytic approach also captures how language-specific knowledge and selective attention influences the neural processing of pitch accent categories.


Subject(s)
Speech Perception , Speech , Auditory Perception , Humans , Language , Phonetics
18.
Eur J Neurosci ; 53(11): 3640-3653, 2021 06.
Article in English | MEDLINE | ID: mdl-33861480

ABSTRACT

Traditional electrophysiological methods to study temporal auditory processing of the fundamental frequency of the voice (f0) often use unnaturally repetitive stimuli. In this study, we investigated f0 processing of meaningful continuous speech. EEG responses evoked by stories in quiet were analysed with a novel method based on linear modelling that characterizes the neural tracking of the f0. We studied both the strength and the spatio-temporal properties of the f0-tracking response. Moreover, different samples of continuous speech (six stories by four speakers: two male and two female) were used to investigate the effect of voice characteristics on the f0 response. The results indicated that response strength is inversely related to f0 frequency and rate of f0 change throughout the story. As a result, the male-narrated stories in this study (low and steady f0) evoked stronger f0-tracking compared to female-narrated stories (high and variable f0), for which many responses were not significant. The spatio-temporal analysis revealed that f0-tracking response generators were not fixed in the brainstem but were voice-dependent as well. Voices with high and variable f0 evoked subcortically dominated responses with a latency between 7 and 12 ms. Voices with low and steady f0 evoked responses that are both subcortically (latency of 13-15 ms) and cortically (latency of 23-26 ms) generated, with the right primary auditory cortex as a likely cortical source. Finally, additional experiments revealed that response strength greatly improves for voices with strong higher harmonics, which is particularly useful to boost the small responses evoked by voices with high f0.


Subject(s)
Auditory Cortex , Speech Perception , Voice , Acoustic Stimulation , Auditory Perception , Brain Stem , Female , Humans , Male , Speech
19.
Neuroimage ; 219: 116936, 2020 10 01.
Article in English | MEDLINE | ID: mdl-32474080

ABSTRACT

Natural speech builds on contextual relations that can prompt predictions of upcoming utterances. To study the neural underpinnings of such predictive processing we asked 10 healthy adults to listen to a 1-h-long audiobook while their magnetoencephalographic (MEG) brain activity was recorded. We correlated the MEG signals with acoustic speech envelope, as well as with estimates of Bayesian word probability with and without the contextual word sequence (N-gram and Unigram, respectively), with a focus on time-lags. The MEG signals of auditory and sensorimotor cortices were strongly coupled to the speech envelope at the rates of syllables (4-8 â€‹Hz) and of prosody and intonation (0.5-2 â€‹Hz). The probability structure of word sequences, independently of the acoustical features, affected the ≤ 2-Hz signals extensively in auditory and rolandic regions, in precuneus, occipital cortices, and lateral and medial frontal regions. Fine-grained temporal progression patterns occurred across brain regions 100-1000 â€‹ms after word onsets. Although the acoustic effects were observed in both hemispheres, the contextual influences were statistically significantly lateralized to the left hemisphere. These results serve as a brain signature of the predictability of word sequences in listened continuous speech, confirming and extending previous results to demonstrate that deeply-learned knowledge and recent contextual information are employed dynamically and in a left-hemisphere-dominant manner in predicting the forthcoming words in natural speech.


Subject(s)
Brain/physiology , Speech Perception/physiology , Acoustic Stimulation , Adult , Attention/physiology , Auditory Cortex/physiology , Brain Mapping , Female , Humans , Magnetoencephalography , Male , Middle Aged , Speech/physiology , Young Adult
20.
Eur J Neurosci ; 51(5): 1364-1376, 2020 03.
Article in English | MEDLINE | ID: mdl-29888819

ABSTRACT

During natural speech perception, humans must parse temporally continuous auditory and visual speech signals into sequences of words. However, most studies of speech perception present only single words or syllables. We used electrocorticography (subdural electrodes implanted on the brains of epileptic patients) to investigate the neural mechanisms for processing continuous audiovisual speech signals consisting of individual sentences. Using partial correlation analysis, we found that posterior superior temporal gyrus (pSTG) and medial occipital cortex tracked both the auditory and the visual speech envelopes. These same regions, as well as inferior temporal cortex, responded more strongly to a dynamic video of a talking face compared to auditory speech paired with a static face. Occipital cortex and pSTG carry temporal information about both auditory and visual speech dynamics. Visual speech tracking in pSTG may be a mechanism for enhancing perception of degraded auditory speech.


Subject(s)
Auditory Cortex , Speech Perception , Acoustic Stimulation , Auditory Perception , Brain Mapping , Electrocorticography , Humans , Occipital Lobe , Speech , Visual Perception
SELECTION OF CITATIONS
SEARCH DETAIL