RESUMO
In recent years, speech perception research has benefited from low-frequency rhythm entrainment tracking of the speech envelope. However, speech perception is still controversial regarding the role of speech envelope and temporal fine structure, especially in Mandarin. This study aimed to discuss the dependence of Mandarin syllables and tones perception on the speech envelope and the temporal fine structure. We recorded the electroencephalogram (EEG) of the subjects under three acoustic conditions using the sound chimerism analysis, including (i) the original speech, (ii) the speech envelope and the sinusoidal modulation, and (iii) the fine structure of time and the modulation of the non-speech (white noise) sound envelope. We found that syllable perception mainly depended on the speech envelope, while tone perception depended on the temporal fine structure. The delta bands were prominent, and the parietal and prefrontal lobes were the main activated brain areas, regardless of whether syllable or tone perception was involved. Finally, we decoded the spatiotemporal features of Mandarin perception from the microstate sequence. The spatiotemporal feature sequence of the EEG caused by speech material was found to be specific, suggesting a new perspective for the subsequent auditory brain-computer interface. These results provided a new scheme for the coding strategy of new hearing aids for native Mandarin speakers.
Assuntos
Percepção da Fala , Humanos , Ruído , Percepção do Timbre , Acústica da Fala , Eletroencefalografia , Estimulação AcústicaRESUMO
A growing number of studies has investigated temporal processing deficits in dyslexia. These studies largely focus on neural synchronization to speech. However, the importance of rise times for neural synchronization is often overlooked. Furthermore, targeted interventions, phonics-based and auditory, are being developed, but little is known about their impact. The current study investigated the impact of a 12-week tablet-based intervention. Children at risk for dyslexia received phonics-based training, either with (n = 31) or without (n = 31) auditory training, or engaged in active control training (n = 29). Additionally, neural synchronization and processing of rise times was longitudinally investigated in children with dyslexia (n = 26) and typical readers (n = 52) from pre-reading (5 years) to beginning reading age (7 years). The three time points in the longitudinal study correspond to intervention pre-test, post-test and consolidation, approximately 1 year after completing the intervention. At each time point neural synchronization was measured to sinusoidal stimuli and pulsatile stimuli with shortened rise times at syllable (4 Hz) and phoneme rates (20 Hz). Our results revealed no impact on neural synchronization at syllable and phoneme rate of the phonics-based and auditory training. However, we did reveal atypical hemispheric specialization at both syllable and phoneme rates in children with dyslexia. This was detected even before the onset of reading acquisition, pointing towards a possible causal rather than consequential mechanism in dyslexia. This study contributes to our understanding of the temporal processing deficits underlying the development of dyslexia, but also shows that the development of targeted interventions is still a work in progress.
Assuntos
Dislexia , Percepção da Fala , Criança , Humanos , Estudos Longitudinais , Dislexia/terapia , Leitura , FalaRESUMO
The most prominent acoustic features in speech are intensity modulations, represented by the amplitude envelope of speech. Synchronization of neural activity with these modulations supports speech comprehension. As the acoustic modulation of speech is related to the production of syllables, investigations of neural speech tracking commonly do not distinguish between lower-level acoustic (envelope modulation) and higher-level linguistic (syllable rate) information. Here we manipulated speech intelligibility using noise-vocoded speech and investigated the spectral dynamics of neural speech processing, across two studies at cortical and subcortical levels of the auditory hierarchy, using magnetoencephalography. Overall, cortical regions mostly track the syllable rate, whereas subcortical regions track the acoustic envelope. Furthermore, with less intelligible speech, tracking of the modulation rate becomes more dominant. Our study highlights the importance of distinguishing between envelope modulation and syllable rate and provides novel possibilities to better understand differences between auditory processing and speech/language processing disorders.
Assuntos
Percepção da Fala , Fala , Humanos , Magnetoencefalografia , Ruído , Cognição , Estimulação Acústica , Inteligibilidade da FalaRESUMO
Using the source-filter model of speech production, clean speech signals can be decomposed into an excitation component and an envelope component that is related to the phoneme being uttered. Therefore, restoring the envelope of degraded speech during speech enhancement can improve the intelligibility and quality of output. As the number of phonemes in spoken speech is limited, they can be adequately represented by a correspondingly limited number of envelopes. This can be exploited to improve the estimation of speech envelopes from a degraded signal in a data-driven manner. The improved envelopes are then used in a second stage to refine the final speech estimate. Envelopes are typically derived from the linear prediction coefficients (LPCs) or from the cepstral coefficients (CCs). The improved envelope is obtained either by mapping the degraded envelope onto pre-trained codebooks (classification approach) or by directly estimating it from the degraded envelope (regression approach). In this work, we first investigate the optimal features for envelope representation and codebook generation by a series of oracle tests. We demonstrate that CCs provide better envelope representation compared to using the LPCs. Further, we demonstrate that a unified speech codebook is advantageous compared to the typical codebook that manually splits speech and silence as separate entries. Next, we investigate low-complexity neural network architectures to map degraded envelopes to the optimal codebook entry in practical systems. We confirm that simple recurrent neural networks yield good performance with a low complexity and number of parameters. We also demonstrate that with a careful choice of the feature and architecture, a regression approach can further improve the performance at a lower computational cost. However, as also seen from the oracle tests, the benefit of the two-stage framework is now chiefly limited by the statistical noise floor estimate, leading to only a limited improvement in extremely adverse conditions. This highlights the need for further research on joint estimation of speech and noise for optimum enhancement.
Assuntos
Percepção da Fala , Fala , Ruído , Redes Neurais de Computação , CogniçãoRESUMO
Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.
Assuntos
Percepção da Fala , Fala , Estimulação Acústica , Acústica , Humanos , Boca , Percepção VisualRESUMO
Dyslexia has frequently been related to atypical auditory temporal processing and speech perception. Results of studies emphasizing speech onset cues and reinforcing the temporal structure of the speech envelope, that is, envelope enhancement (EE), demonstrated reduced speech perception deficits in individuals with dyslexia. The use of this strategy as auditory intervention might thus reduce some of the deficits related to dyslexia. Importantly, reading-skill interventions are most effective when they are provided during kindergarten and first grade. Hence, we provided a tablet-based 12-week auditory and phonics-based intervention to pre-readers at cognitive risk for dyslexia and investigated the effect on auditory temporal processing with a rise time discrimination (RTD) task. Ninety-one pre-readers at cognitive risk for dyslexia (aged 5-6) were assigned to two groups receiving a phonics-based intervention and playing a story listening game either with (n = 31) or without (n = 31) EE or a third group playing control games and listening to non-enhanced stories (n = 29). RTD was measured directly before, directly after and 1 year after the intervention. While the groups listening to non-enhanced stories mainly improved after the intervention during first grade, the group listening to enhanced stories improved during the intervention in kindergarten and subsequently remained stable during first grade. Hence, an EE intervention improves auditory processing skills important for the development of phonological skills. This occurred before the onset of reading instruction, preceding the maturational improvement of these skills, hence potentially giving at risk children a head start when learning to read. A video abstract of this article can be viewed at https://www.youtube.com/watch?v=e0BfT4dGXNA.
Assuntos
Dislexia , Percepção da Fala , Criança , Cognição , Dislexia/psicologia , Humanos , Fonética , Leitura , FalaRESUMO
Auditory cortical activity entrains to speech rhythms and has been proposed as a mechanism for online speech processing. In particular, neural activity in the theta frequency band (4-8 âHz) tracks the onset of syllables which may aid the parsing of a speech stream. Similarly, cortical activity in the delta band (1-4 âHz) entrains to the onset of words in natural speech and has been found to encode both syntactic as well as semantic information. Such neural entrainment to speech rhythms is not merely an epiphenomenon of other neural processes, but plays a functional role in speech processing: modulating the neural entrainment through transcranial alternating current stimulation influences the speech-related neural activity and modulates the comprehension of degraded speech. However, the distinct functional contributions of the delta- and of the theta-band entrainment to the modulation of speech comprehension have not yet been investigated. Here we use transcranial alternating current stimulation with waveforms derived from the speech envelope and filtered in the delta and theta frequency bands to alter cortical entrainment in both bands separately. We find that transcranial alternating current stimulation in the theta band but not in the delta band impacts speech comprehension. Moreover, we find that transcranial alternating current stimulation with the theta-band portion of the speech envelope can improve speech-in-noise comprehension beyond sham stimulation. Our results show a distinct contribution of the theta- but not of the delta-band stimulation to the modulation of speech comprehension. In addition, our findings open up a potential avenue of enhancing the comprehension of speech in noise.
Assuntos
Córtex Cerebral/fisiologia , Compreensão/fisiologia , Ritmo Delta/fisiologia , Percepção da Fala/fisiologia , Ritmo Teta/fisiologia , Estimulação Transcraniana por Corrente Contínua , Adulto , Feminino , Humanos , Masculino , Ruído , Adulto JovemRESUMO
When listening to natural speech, our brain activity tracks the slow amplitude modulations of speech, also called the speech envelope. Moreover, recent research has demonstrated that this neural envelope tracking can be affected by top-down processes. The present study was designed to examine if neural envelope tracking is modulated by the effort that a person expends during listening. Five measures were included to quantify listening effort: two behavioral measures based on a novel dual-task paradigm, a self-report effort measure and two neural measures related to phase synchronization and alpha power. Electroencephalography responses to sentences, presented at a wide range of subject-specific signal-to-noise ratios, were recorded in thirteen young, normal-hearing adults. A comparison of the five measures revealed different effects of listening effort as a function of speech understanding. Reaction times on the primary task and self-reported effort decreased with increasing speech understanding. In contrast, reaction times on the secondary task and alpha power showed a peak-shaped behavior with highest effort at intermediate speech understanding levels. With regard to neural envelope tracking, we found that the reaction times on the secondary task and self-reported effort explained a small part of the variability in theta-band envelope tracking. Speech understanding was found to strongly modulate neural envelope tracking. More specifically, our results demonstrated a robust increase in envelope tracking with increasing speech understanding. The present study provides new insights in the relations among different effort measures and highlights the potential of neural envelope tracking to objectively measure speech understanding in young, normal-hearing adults.
Assuntos
Percepção da Fala , Adulto , Percepção Auditiva , Humanos , Tempo de Reação , Autorrelato , FalaRESUMO
Recognizing speech in noisy environments is a challenging task that involves both auditory and language mechanisms. Previous studies have demonstrated human auditory cortex can reliably track the temporal envelope of speech in noisy environments, which provides a plausible neural basis for noise-robust speech recognition. The current study aimed at teasing apart auditory and language contributions to noise-robust envelope tracking by comparing the neural responses of 2 groups of listeners, i.e., native listeners and foreign listeners who did not understand the testing language. In the experiment, speech signals were mixed with spectrally matched stationary noise at 4 intensity levels and listeners' neural responses were recorded using electroencephalography (EEG). When the noise intensity increased, the neural response gain increased in both groups of listeners, demonstrating auditory gain control. Language comprehension generally reduced the response gain and envelope-tracking precision, and modulated the spatial and temporal profile of envelope-tracking activity. Based on the spatio-temporal dynamics of envelope-tracking activity, a linear classifier can jointly decode the 2 listener groups and 4 levels of noise intensity. Altogether, the results showed that without feedback from language processing, auditory mechanisms such as gain control can lead to a noise-robust speech representation. High-level language processing modulated the spatio-temporal profile of the neural representation of speech envelope, instead of generally enhancing the envelope representation.
Assuntos
Encéfalo/fisiologia , Idioma , Ruído , Percepção da Fala/fisiologia , Adolescente , Adulto , Compreensão/fisiologia , Feminino , Humanos , Masculino , Adulto JovemRESUMO
Viewing a speaker's lip movements can improve the brain's ability to 'track' the amplitude envelope of the auditory speech signal and facilitate intelligibility. Whether such neurobehavioral benefits can also arise from tactually sensing the speech envelope on the skin is unclear. We hypothesized that tactile speech envelopes can improve neural tracking of auditory speech and thereby facilitate intelligibility. To test this, we applied continuous auditory speech and vibrotactile speech-envelope-shaped stimulation at various asynchronies to the ears and index fingers of normally-hearing human listeners while simultaneously assessing speech-recognition performance and cortical speech-envelope tracking with electroencephalography. Results indicate that tactile speech-shaped envelopes improve the cortical tracking, but not intelligibility, of degraded auditory speech. The cortical speech-tracking benefit occurs for tactile input leading the auditory input by 100â¯mâ¯s or less, emerges in the EEG during an early time window (~0-150â¯mâ¯s), and in particular involves cortical activity in the delta (1-4â¯Hz) range. These characteristics hint at a predictive mechanism for multisensory integration of complex slow time-varying inputs that might play a role in tactile speech communication.
Assuntos
Córtex Cerebral/fisiologia , Ritmo Delta/fisiologia , Eletroencefalografia , Inteligibilidade da Fala , Percepção da Fala/fisiologia , Percepção do Tato/fisiologia , Adolescente , Adulto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estimulação Física , Fatores de Tempo , Adulto JovemRESUMO
When we grow older, understanding speech in noise becomes more challenging. Research has demonstrated the role of auditory temporal and cognitive deficits in these age-related speech-in-noise difficulties. To better understand the underlying neural mechanisms, we recruited young, middle-aged, and older normal-hearing adults and investigated the interplay between speech understanding, cognition, and neural tracking of the speech envelope using electroencephalography. The stimuli consisted of natural speech masked by speech-weighted noise or a competing talker and were presented at several subject-specific speech understanding levels. In addition to running speech, we recorded auditory steady-state responses at low modulation frequencies to assess the effect of age on nonspeech sounds. The results show that healthy aging resulted in a supralinear increase in the speech reception threshold, i.e., worse speech understanding, most pronounced for the competing talker. Similarly, advancing age was associated with a supralinear increase in envelope tracking, with a pronounced enhancement for older adults. Additionally, envelope tracking was found to increase with speech understanding, most apparent for older adults. Because we found that worse cognitive scores were associated with enhanced envelope tracking, our results support the hypothesis that enhanced envelope tracking in older adults is the result of a higher activation of brain regions for processing speech, compared with younger adults. From a cognitive perspective, this could reflect the inefficient use of cognitive resources, often observed in behavioral studies. Interestingly, the opposite effect of age was found for auditory steady-state responses, suggesting a complex interplay of different neural mechanisms with advancing age.NEW & NOTEWORTHY We measured neural tracking of the speech envelope across the adult lifespan and found a supralinear increase in envelope tracking with age. Using a more ecologically valid approach than auditory steady-state responses, we found that young and older, as well as middle-aged, normal-hearing adults showed an increase in envelope tracking with increasing speech understanding and that this association is stronger for older adults.
Assuntos
Envelhecimento/fisiologia , Córtex Cerebral/fisiologia , Compreensão/fisiologia , Percepção da Fala/fisiologia , Estimulação Acústica , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Eletroencefalografia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Mascaramento Perceptivo/fisiologia , Psicolinguística , Adulto JovemRESUMO
Temporal cues are important for discerning word boundaries and syllable segments in speech; their perception facilitates language acquisition and development. Beat synchronization and neural encoding of speech reflect precision in processing temporal cues and have been linked to reading skills. In poor readers, diminished neural precision may contribute to rhythmic and phonological deficits. Here we establish links between beat synchronization and speech processing in children who have not yet begun to read: preschoolers who can entrain to an external beat have more faithful neural encoding of temporal modulations in speech and score higher on tests of early language skills. In summary, we propose precise neural encoding of temporal modulations as a key mechanism underlying reading acquisition. Because beat synchronization abilities emerge at an early age, these findings may inform strategies for early detection of and intervention for language-based learning disabilities.
Assuntos
Vias Neurais/fisiologia , Leitura , Percepção da Fala/fisiologia , Fala/fisiologia , Estimulação Acústica/métodos , Análise de Variância , Percepção Auditiva/fisiologia , Pré-Escolar , Sinais (Psicologia) , Eletrodos , Eletrofisiologia/instrumentação , Eletrofisiologia/métodos , Feminino , Humanos , Desenvolvimento da Linguagem , Aprendizagem/fisiologia , Masculino , FonéticaRESUMO
The temporal envelope of speech is an important cue contributing to speech intelligibility. Theories about the neural foundations of speech perception postulate that the left and right auditory cortices are functionally specialized in analyzing speech envelope information at different time scales: the right hemisphere is thought to be specialized in processing syllable rate modulations, whereas a bilateral or left hemispheric specialization is assumed for phoneme rate modulations. Recently, it has been found that this functional hemispheric asymmetry is different in individuals with language-related disorders such as dyslexia. Most studies were, however, performed in adults and school-aged children, and only a little is known about how neural auditory processing at these specific rates manifests and develops in very young children before reading acquisition. Yet, studying hemispheric specialization for processing syllable and phoneme rate modulations in preliterate children may reveal early neural markers for dyslexia. In the present study, human cortical evoked potentials to syllable and phoneme rate modulations were measured in 5-year-old children at high and low hereditary risk for dyslexia. The results demonstrate a right hemispheric preference for processing syllable rate modulations and a symmetric pattern for phoneme rate modulations, regardless of hereditary risk for dyslexia. These results suggest that, while hemispheric specialization for processing syllable rate modulations seems to be mature in prereading children, hemispheric specialization for phoneme rate modulation processing may still be developing. These findings could have important implications for the development of phonological and reading skills.
Assuntos
Cérebro/fisiologia , Dominância Cerebral/fisiologia , Percepção da Fala/fisiologia , Pré-Escolar , Eletroencefalografia , Potenciais Evocados Auditivos , Feminino , Humanos , MasculinoRESUMO
An objective method for assessing speech audibility is essential to evaluate hearing aid benefit in children who are unable to participate in hearing tests. With consonant-vowel syllables, brainstem-dominant responses elicited at the voice fundamental frequency have proven successful for assessing audibility. This study aimed to harness the neural activity elicited by the slow envelope of the same repetitive consonant-vowel syllables to assess audibility. In adults and children with normal hearing and children with hearing loss wearing hearing aids, neural activity elicited by the stimulus /su∫i/ or /sa∫i/ presented at 55-75â dB SPL was analyzed using the temporal response function approach. No-stimulus runs or very low stimulus level (15â dB SPL) were used to simulate inaudible conditions in adults and children with normal hearing. Both groups of children demonstrated higher response amplitudes relative to adults. Detectability (sensitivity; true positive rate) ranged between 80.1 and 100%, and did not vary by group or stimulus level but varied by stimulus, with /sa∫i/ achieving 100% detectability at 65â dB SPL. The average minimum time needed to detect a response ranged between 3.7 and 6.4â min across stimuli and listener groups, with the shortest times recorded for stimulus /sa∫i/ and in children with hearing loss. Specificity was >94.9%. Responses to the slow envelope of non-meaningful consonant-vowel syllables can be used to ascertain audible vs. inaudible speech with sufficient accuracy within clinically feasible test times. Such responses can increase the clinical usefulness of existing objective approaches to evaluate hearing aid benefit.
Assuntos
Surdez , Auxiliares de Audição , Perda Auditiva Neurossensorial , Perda Auditiva , Percepção da Fala , Adulto , Criança , Humanos , Fala , Percepção da Fala/fisiologia , Perda Auditiva/diagnóstico , Perda Auditiva Neurossensorial/reabilitaçãoRESUMO
Assessing cognitive function-especially language processing-in severely brain-injured patients is critical for prognostication, care, and development of communication devices (e.g. brain-computer interfaces). In patients with diminished motor function, language processing has been probed using EEG measures of command-following in motor imagery tasks. While such tests eliminate the need for motor response, they require sustained attention. However, passive listening tasks, with an EEG response measure can reduce both motor and attentional demands. These considerations motivated the development of two assays of low-level language processing-identification of differential phoneme-class responses and tracking of the natural speech envelope. This cross-sectional study looks at a cohort of 26 severely brain-injured patient subjects and 10 healthy controls. Patients' level of function was assessed via the coma recovery scale-revised at the bedside. Patients were also tested for command-following via EEG and/or MRI assays of motor imagery. For the present investigation, EEG was recorded while presenting a 148â s audio clip of Alice in Wonderland. Time-locked EEG responses to phoneme classes were extracted and compared to determine a differential phoneme-class response. Tracking of the natural speech envelope was assessed from the same recordings by cross-correlating the EEG response with the speech envelope. In healthy controls, the dynamics of the two measures were temporally similar but spatially different: a central parieto-occipital component of differential phoneme-class response was absent in the natural speech envelope response. The differential phoneme-class response was present in all patient subjects, including the six classified as vegetative state/unresponsive wakefulness syndrome by behavioural assessment. However, patient subjects with evidence of language processing either by behavioural assessment or motor imagery tests had an early bilateral response in the first 50â ms that was lacking in patient subjects without any evidence of language processing. The natural speech envelope tracking response was also present in all patient subjects and responses in the first 100â ms distinguished patient subjects with evidence of language processing. Specifically, patient subjects with evidence of language processing had a more global response in the first 100â ms whereas those without evidence of language processing had a frontopolar response in that period. In summary, we developed two passive EEG-based methods to probe low-level language processing in severely brain-injured patients. In our cohort, both assays showed a difference between patient subjects with evidence of command-following and those with no evidence of command-following: a more prominent early bilateral response component.
RESUMO
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Assuntos
Percepção da Fala , Fala , Humanos , Estimulação Acústica , Acústica , PeriodicidadeRESUMO
Neural synchronization to amplitude-modulated noise at three frequencies (2 Hz, 5 Hz, 8 Hz) thought to be important for syllable perception was investigated in English-speaking school-aged children. The theoretically-important delta-band (â¼2Hz, stressed syllable level) was included along with two syllable-level rates. The auditory steady state response (ASSR) was recorded using EEG in 36 7-to-12-year-old children. Half of the sample had either dyslexia or dyslexia and DLD (developmental language disorder). In comparison to typically-developing children, children with dyslexia or with dyslexia and DLD showed reduced ASSRs for 2 Hz stimulation but similar ASSRs at 5 Hz and 8 Hz. These novel data for English ASSRs converge with prior data suggesting that children with dyslexia have atypical synchrony between brain oscillations and incoming auditory stimulation at â¼ 2 Hz, the rate of stressed syllable production across languages. This atypical synchronization likely impairs speech processing, phonological processing, and possibly syntactic processing, as predicted by Temporal Sampling theory.
Assuntos
Dislexia , Percepção da Fala , Humanos , Criança , Fala , Estimulação Acústica , Percepção da Fala/fisiologia , RuídoRESUMO
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
RESUMO
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
RESUMO
Cochlear implants (CIs) are commonly used to restore the ability to hear in those with severe or profound hearing loss. CIs provide the necessary auditory feedback for them to monitor and control speech production. However, the speech produced by CI users may not be fully restored to achieve similar perceived sound quality to that produced by normal-hearing talkers and this difference is easily noticeable in their daily conversation. In this study, we attempt to address this difference as perceived by normal-hearing listeners, when listening to continuous speech produced by CI talkers and normal-hearing talkers. We used a regenerative model to decode and reconstruct the speech envelope from the single-trial electroencephalogram (EEG) recorded on the scalp of the normal-hearing listeners. Bootstrap Spearman correlation between the actual speech envelope and the envelope reconstructed from the EEG was computed as a metric to quantify the difference in response to the speech produced by the two talker groups. The same listeners were asked to rate the perceived sound quality of the speech produced by the two talker groups as a behavioral sound quality assessment. The results show that both the perceived sound quality ratings and the computed metric, which can be seen as the degree of cortical entrainment to the actual speech envelope across the normal-hearing listeners, were higher in value for speech produced by normal hearing talkers than that for CI talkers. The first purpose of the study was to determine how well the envelope of speech is represented neurophysiologically via its similarity to the envelope reconstructed from EEG. The second purpose was to show how well this representation of speech for both CI and normal hearing talker groups differentiates in term of perceived sound quality.