Búsqueda | Biblioteca Virtual en Salud

1.

Abnormal connectivity and activation during audiovisual speech perception in schizophrenia.

Hirano, Yoji; Nakamura, Itta; Tamura, Shunsuke.

Eur J Neurosci ; 59(8): 1918-1932, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37990611

RESUMEN

The unconscious integration of vocal and facial cues during speech perception facilitates face-to-face communication. Recent studies have provided substantial behavioural evidence concerning impairments in audiovisual (AV) speech perception in schizophrenia. However, the specific neurophysiological mechanism underlying these deficits remains unknown. Here, we investigated activities and connectivities centered on the auditory cortex during AV speech perception in schizophrenia. Using magnetoencephalography, we recorded and analysed event-related fields in response to auditory (A: voice), visual (V: face) and AV (voice-face) stimuli in 23 schizophrenia patients (13 males) and 22 healthy controls (13 males). The functional connectivity associated with the subadditive response to AV stimulus (i.e., [AV] < [A] + [V]) was also compared between the two groups. Within the healthy control group, [AV] activity was smaller than the sum of [A] and [V] at latencies of approximately 100 ms in the posterior ramus of the lateral sulcus in only the left hemisphere, demonstrating a subadditive N1m effect. Conversely, the schizophrenia group did not show such a subadditive response. Furthermore, weaker functional connectivity from the posterior ramus of the lateral sulcus of the left hemisphere to the fusiform gyrus of the right hemisphere was observed in schizophrenia. Notably, this weakened connectivity was associated with the severity of negative symptoms. These results demonstrate abnormalities in connectivity between speech- and face-related cortical areas in schizophrenia. This aberrant subadditive response and connectivity deficits for integrating speech and facial information may be the neural basis of social communication dysfunctions in schizophrenia.

Asunto(s)

Corteza Auditiva , Esquizofrenia , Percepción del Habla , Masculino , Humanos , Percepción del Habla/fisiología , Magnetoencefalografía , Habla/fisiología , Percepción Visual/fisiología , Percepción Auditiva/fisiología , Estimulación Acústica/métodos

2.

Children with developmental dyslexia have equivalent audiovisual speech perception performance but their perceptual weights differ.

Gijbels, Liesbeth; Lee, Adrian K C; Yeatman, Jason D.

Dev Sci ; 27(1): e13431, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37403418

RESUMEN

As reading is inherently a multisensory, audiovisual (AV) process where visual symbols (i.e., letters) are connected to speech sounds, the question has been raised whether individuals with reading difficulties, like children with developmental dyslexia (DD), have broader impairments in multisensory processing. This question has been posed before, yet it remains unanswered due to (a) the complexity and contentious etiology of DD along with (b) lack of consensus on developmentally appropriate AV processing tasks. We created an ecologically valid task for measuring multisensory AV processing by leveraging the natural phenomenon that speech perception improves when listeners are provided visual information from mouth movements (particularly when the auditory signal is degraded). We designed this AV processing task with low cognitive and linguistic demands such that children with and without DD would have equal unimodal (auditory and visual) performance. We then collected data in a group of 135 children (age 6.5-15) with an AV speech perception task to answer the following questions: (1) How do AV speech perception benefits manifest in children, with and without DD? (2) Do children all use the same perceptual weights to create AV speech perception benefits, and (3) what is the role of phonological processing in AV speech perception? We show that children with and without DD have equal AV speech perception benefits on this task, but that children with DD rely less on auditory processing in more difficult listening situations to create these benefits and weigh both incoming information streams differently. Lastly, any reported differences in speech perception in children with DD might be better explained by differences in phonological processing than differences in reading skills. RESEARCH HIGHLIGHTS: Children with versus without developmental dyslexia have equal audiovisual speech perception benefits, regardless of their phonological awareness or reading skills. Children with developmental dyslexia rely less on auditory performance to create audiovisual speech perception benefits. Individual differences in speech perception in children might be better explained by differences in phonological processing than differences in reading skills.

Asunto(s)

Dislexia , Percepción del Habla , Niño , Humanos , Adolescente , Dislexia/psicología , Lectura , Fonética , Concienciación

3.

EEG-based auditory attention decoding with audiovisual speech for hearing-impaired listeners.

Wang, Bo; Xu, Xiran; Niu, Yadong; Wu, Chao; Wu, Xihong; Chen, Jing.

Cereb Cortex ; 33(22): 10972-10983, 2023 11 04.

Artículo en Inglés | MEDLINE | ID: mdl-37750333

RESUMEN

Auditory attention decoding (AAD) was used to determine the attended speaker during an auditory selective attention task. However, the auditory factors modulating AAD remained unclear for hearing-impaired (HI) listeners. In this study, scalp electroencephalogram (EEG) was recorded with an auditory selective attention paradigm, in which HI listeners were instructed to attend one of the two simultaneous speech streams with or without congruent visual input (articulation movements), and at a high or low target-to-masker ratio (TMR). Meanwhile, behavioral hearing tests (i.e. audiogram, speech reception threshold, temporal modulation transfer function) were used to assess listeners' individual auditory abilities. The results showed that both visual input and increasing TMR could significantly enhance the cortical tracking of the attended speech and AAD accuracy. Further analysis revealed that the audiovisual (AV) gain in attended speech cortical tracking was significantly correlated with listeners' auditory amplitude modulation (AM) sensitivity, and the TMR gain in attended speech cortical tracking was significantly correlated with listeners' hearing thresholds. Temporal response function analysis revealed that subjects with higher AM sensitivity demonstrated more AV gain over the right occipitotemporal and bilateral frontocentral scalp electrodes.

Asunto(s)

Pérdida Auditiva , Percepción del Habla , Humanos , Habla , Percepción del Habla/fisiología , Audición/fisiología , Electroencefalografía , Atención/fisiología , Umbral Auditivo/fisiología

4.

Neural oscillations reflect the individual differences in the temporal perception of audiovisual speech.

Jiang, Zeliang; An, Xingwei; Liu, Shuang; Yin, Erwei; Yan, Ye; Ming, Dong.

Cereb Cortex ; 33(20): 10575-10583, 2023 10 09.

Artículo en Inglés | MEDLINE | ID: mdl-37727958

RESUMEN

Multisensory integration occurs within a limited time interval between multimodal stimuli. Multisensory temporal perception varies widely among individuals and involves perceptual synchrony and temporal sensitivity processes. Previous studies explored the neural mechanisms of individual differences for beep-flash stimuli, whereas there was no study for speech. In this study, 28 subjects (16 male) performed an audiovisual speech/ba/simultaneity judgment task while recording their electroencephalography. We examined the relationship between prestimulus neural oscillations (i.e. the pre-pronunciation movement-related oscillations) and temporal perception. The perceptual synchrony was quantified using the Point of Subjective Simultaneity and temporal sensitivity using the Temporal Binding Window. Our results revealed dissociated neural mechanisms for individual differences in Temporal Binding Window and Point of Subjective Simultaneity. The frontocentral delta power, reflecting top-down attention control, is positively related to the magnitude of individual auditory leading Temporal Binding Windows (auditory Temporal Binding Windows; LTBWs), whereas the parieto-occipital theta power, indexing bottom-up visual temporal attention specific to speech, is negatively associated with the magnitude of individual visual leading Temporal Binding Windows (visual Temporal Binding Windows; RTBWs). In addition, increased left frontal and bilateral temporoparietal occipital alpha power, reflecting general attentional states, is associated with increased Points of Subjective Simultaneity. Strengthening attention abilities might improve the audiovisual temporal perception of speech and further impact speech integration.

Asunto(s)

Percepción del Habla , Percepción del Tiempo , Humanos , Masculino , Percepción Auditiva , Percepción Visual , Habla , Individualidad , Estimulación Acústica , Estimulación Luminosa

5.

Exploring audiovisual speech perception in monolingual and bilingual children in Uzbekistan.

Nematova, Shakhlo; Zinszer, Benjamin; Jasinska, Kaja K.

J Exp Child Psychol ; 239: 105808, 2024 03.

Artículo en Inglés | MEDLINE | ID: mdl-37972516

RESUMEN

This study aimed to investigate the development of audiovisual speech perception in monolingual Uzbek-speaking and bilingual Uzbek-Russian-speaking children, focusing on the impact of language experience on audiovisual speech perception and the role of visual phonetic (i.e., mouth movements corresponding to phonetic/lexical information) and temporal (i.e., timing of speech signals) cues. A total of 321 children aged 4 to 10 years in Tashkent, Uzbekistan, discriminated /ba/ and /da/ syllables across three conditions: auditory-only, audiovisual phonetic (i.e., sound accompanied by mouth movements), and audiovisual temporal (i.e., sound onset/offset accompanied by mouth opening/closing). Effects of modality (audiovisual phonetic, audiovisual temporal, or audio-only cues), age, group (monolingual or bilingual), and their interactions were tested using a Bayesian regression model. Overall, older participants performed better than younger participants. Participants performed better in the audiovisual phonetic modality compared with the auditory modality. However, no significant difference between monolingual and bilingual children was observed across all modalities. This finding stands in contrast to earlier studies. We attribute the contrasting findings of our study and the existing literature to the cross-linguistic similarity of the language pairs involved. When the languages spoken by bilinguals exhibit substantial linguistic similarity, there may be an increased necessity to disambiguate speech signals, leading to a greater reliance on audiovisual cues. The limited phonological similarity between Uzbek and Russian might have minimized bilinguals' need to rely on visual speech cues, contributing to the lack of group differences in our study.

Asunto(s)

Multilingüismo , Percepción del Habla , Niño , Humanos , Uzbekistán , Teorema de Bayes , Fonética , Habla

6.

Speech perception: Auditory and visual cue integration in children with and without phonological disorder in voiceless fricatives.

de Assis, Mayara Ferreira; Berti, Larissa Cristina.

Clin Linguist Phon ; : 1-17, 2024 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-38560916

RESUMEN

The literature reports contradictory results regarding the influence of visual cues on speech perception tasks in children with phonological disorder (PD). This study aimed to compare the performance of children with (n = 15) and without PD (n = 15) in audiovisual perception task in voiceless fricatives. Assuming that PD could be associated with an inability to integrate phonological information from two sensory sources, we presumed that children with PD would present difficulties in integrating auditory and visual cues compared to typical children. A syllable identification task was conducted. The stimuli were presented according to four conditions: auditory-only (AO); visual-only (VO); audiovisual congruent (AV+); and audiovisual incongruent (AV-). The percentages of correct answers and the respective reaction times in the AO, VO, and AV+ conditions were considered for the analysis. The correct percentage of auditory stimuli was considered for the AV- condition, as well as the percentage of perceptual preference: auditory, visual, and/or illusion (McGurk effect), with the respective reaction time. In comparing the four conditions, children with PD presented a lower number of correct answers and longer reaction time than children with typical development, mainly for the VO. Both groups showed a preference for auditory stimuli for the AV- condition. However, children with PD showed higher percentages for visual perceptual preference and the McGurk effect than typical children. The superiority of typical children over PD children in auditory-visual speech perception depends on type of stimuli and condition of presentation.

7.

Increasing audiovisual speech integration in autism through enhanced attention to mouth.

Feng, Shuyuan; Wang, Qiandong; Hu, Yixiao; Lu, Haoyang; Li, Tianbi; Song, Ci; Fang, Jing; Chen, Lihan; Yi, Li.

Dev Sci ; 26(4): e13348, 2023 07.

Artículo en Inglés | MEDLINE | ID: mdl-36394129

RESUMEN

Autistic children (AC) show less audiovisual speech integration in the McGurk task, which correlates with their reduced mouth-looking time. The present study examined whether AC's less audiovisual speech integration in the McGurk task could be increased by increasing their mouth-looking time. We recruited 4- to 8-year-old AC and nonautistic children (NAC). In two experiments, we manipulated children's mouth-looking time, measured their audiovisual speech integration by employing the McGurk effect paradigm, and tracked their eye movements. In Experiment 1, we blurred the eyes in McGurk stimuli and compared children's performances in blurred-eyes and clear-eyes conditions. In Experiment 2, we cued children's attention to either the mouth or eyes of McGurk stimuli or asked them to view the McGurk stimuli freely. We found that both blurring the speaker's eyes and cuing to the speaker's mouth increased mouth-looking time and increased audiovisual speech integration in the McGurk task in AC. In addition, we found that blurring the speaker's eyes and cuing to the speaker's mouth also increased mouth-looking time in NAC, but neither blurring the speaker's eyes nor cuing to the speaker's mouth increased their audiovisual speech integration in the McGurk task. Our findings suggest that audiovisual speech integration in the McGurk task in AC could be increased by increasing their attention to the mouth. Our findings contribute to a deeper understanding of relations between face attention and audiovisual speech integration, and provide insights for the development of professional supports to increase audiovisual speech integration in AC. HIGHLIGHTS: The present study examined whether audiovisual speech integration in the McGurk task in AC could be increased by increasing their attention to the speaker's mouth. Blurring the speaker's eyes increased mouth-looking time and audiovisual speech integration in the McGurk task in AC. Cuing to the speaker's mouth also increased mouth-looking time and audiovisual speech integration in the McGurk task in AC. Audiovisual speech integration in the McGurk task in AC could be increased by increasing their attention to the speaker's mouth.

Asunto(s)

Trastorno Autístico , Percepción del Habla , Niño , Humanos , Preescolar , Habla , Movimientos Oculares , Boca , Percepción Visual

8.

Attentional differences in audiovisual face perception between full- and preterm very low birthweight toddlers.

Nakagawa, Atsuko; Sukigara, Masune; Nomura, Kayo; Nagai, Yukiyo; Miyachi, Taishi.

Acta Paediatr ; 112(8): 1715-1724, 2023 08.

Artículo en Inglés | MEDLINE | ID: mdl-37183574

RESUMEN

AIM: To investigate whether rightward attention to the mouth during audiovisual speech perception may be a behavioural marker for early brain development, we studied very preterm and low birthweight (VLBW) and typically developing (TD) toddlers. METHODS: We tested the distribution of gaze points in Japanese-learning TD and VLBW toddlers when exposed to talking, silent and mouth moving faces at 12, 18 and 24 months (corrected age). Each participant was categorised based upon the area they gazed at most (Eye-Right, Eye-Left, Mouth-Right, Mouth-Left) per stimulus per age. A log-linear model was applied to three-dimensional contingency tables (region, side and group). RESULTS: VLBW toddlers showed fewer gaze points than TD toddlers. At 12 months, more VLBW toddlers than TD toddlers showed left attentional bias toward any one face; however, this difference in attention asymmetry receded somewhat by 24 months. In talking condition, TD toddlers showed right attentional bias from 12 to 24 months, whereas VLBW toddlers showed such bias upon reaching 24 months. Additionally, more TD toddlers than VLBW toddlers attended to the mouth. CONCLUSION: Delays in exhibiting the attentional bias for an audiovisual face or general faces displayed by typically developing children might suggest differential developmental timing for hemispheric specialisation or dominance.

Asunto(s)

Reconocimiento Facial , Recién Nacido , Humanos , Preescolar , Recién Nacido de muy Bajo Peso , Cara , Ojo , Aprendizaje

9.

Vision perceptually restores auditory spectral dynamics in speech.

Plass, John; Brang, David; Suzuki, Satoru; Grabowecky, Marcia.

Proc Natl Acad Sci U S A ; 117(29): 16920-16927, 2020 07 21.

Artículo en Inglés | MEDLINE | ID: mdl-32632010

RESUMEN

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

Asunto(s)

Acústica del Lenguaje , Percepción del Habla , Percepción Visual , Adulto , Señales (Psicología) , Femenino , Humanos , Labio/fisiología , Masculino , Fonética

10.

The effect of face orientation on audiovisual speech integration in infancy: An electrophysiological study.

Szmytke, Magdalena; Ilyka, Dianna; Duda-Golawska, Joanna; Laudanska, Zuzanna; Malinowska-Korczak, Anna; Tomalski, Przemyslaw.

Dev Psychobiol ; 65(7): e22431, 2023 11.

Artículo en Inglés | MEDLINE | ID: mdl-37860909

RESUMEN

Humans pay special attention to faces and speech from birth, but the interplay of developmental processes leading to specialization is poorly understood. We investigated the effects of face orientation on audiovisual (AV) speech perception in two age groups of infants (younger: 5- to 6.5-month-olds; older: 9- to 10.5-month-olds) and adults. We recorded event-related potentials (ERP) in response to videos of upright and inverted faces producing /ba/ articulation dubbed with auditory syllables that were either matching /ba/ or mismatching /ga/ the mouth movement. We observed an increase in the amplitude of audiovisual mismatch response (AVMMR) to incongruent visual /ba/-auditory /ga/ syllable in comparison to other stimuli in younger infants, while the older group of infants did not show a similar response. AV mismatch response to inverted visual /ba/-auditory /ga/ stimulus relative to congruent stimuli was also detected in the right frontal areas in the younger group and the left and right frontal areas in adults. We show that face configuration affects the neural response to AV mismatch differently across all age groups. The novel finding of the AVMMR in response to inverted incongruent AV speech may potentially imply the featural face processing in younger infants and adults when processing inverted faces articulating incongruent speech. The lack of visible differential responses to upright and inverted incongruent stimuli obtained in the older group of infants suggests a likely functional cortical reorganization in the processing of AV speech.

Asunto(s)

Percepción del Habla , Habla , Adulto , Humanos , Lactante , Habla/fisiología , Percepción Visual/fisiología , Percepción del Habla/fisiología , Potenciales Evocados , Movimiento , Estimulación Acústica

11.

Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy.

Li, Dengshi; Gao, Yu; Zhu, Chenyi; Wang, Qianrui; Wang, Ruoxi.

Sensors (Basel) ; 23(4)2023 Feb 11.

Artículo en Inglés | MEDLINE | ID: mdl-36850648

RESUMEN

The current accuracy of speech recognition can reach over 97% on different datasets, but in noisy environments, it is greatly reduced. Improving speech recognition performance in noisy environments is a challenging task. Due to the fact that visual information is not affected by noise, researchers often use lip information to help to improve speech recognition performance. This is where the performance of lip recognition and the effect of cross-modal fusion are particularly important. In this paper, we try to improve the accuracy of speech recognition in noisy environments by improving the lip reading performance and the cross-modal fusion effect. First, due to the same lip possibly containing multiple meanings, we constructed a one-to-many mapping relationship model between lips and speech allowing for the lip reading model to consider which articulations are represented from the input lip movements. Audio representations are also preserved by modeling the inter-relationships between paired audiovisual representations. At the inference stage, the preserved audio representations could be extracted from memory by the learned inter-relationships using only video input. Second, a joint cross-fusion model using the attention mechanism could effectively exploit complementary intermodal relationships, and the model calculates cross-attention weights on the basis of the correlations between joint feature representations and individual modalities. Lastly, our proposed model achieved a 4.0% reduction in WER in a -15 dB SNR environment compared to the baseline method, and a 10.1% reduction in WER compared to speech recognition. The experimental results show that our method could achieve a significant improvement over speech recognition models in different noise environments.

Asunto(s)

Lectura de los Labios , Percepción del Habla , Humanos , Habla , Aprendizaje , Labio

12.

Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech.

Yu, Chongchong; Yu, Jiaqi; Qian, Zhaopeng; Tan, Yuchen.

Sensors (Basel) ; 23(4)2023 Feb 12.

Artículo en Inglés | MEDLINE | ID: mdl-36850669

RESUMEN

Endangered language generally has low-resource characteristics, as an immaterial cultural resource that cannot be renewed. Automatic speech recognition (ASR) is an effective means to protect this language. However, for low-resource language, native speakers are few and labeled corpora are insufficient. ASR, thus, suffers deficiencies including high speaker dependence and over fitting, which greatly harms the accuracy of recognition. To tackle the deficiencies, the paper puts forward an approach of audiovisual speech recognition (AVSR) based on LSTM-Transformer. The approach introduces visual modality information including lip movements to reduce the dependence of acoustic models on speakers and the quantity of data. Specifically, the new approach, through the fusion of audio and visual information, enhances the expression of speakers' feature space, thus achieving the speaker adaptation that is difficult in a single modality. The approach also includes experiments on speaker dependence and evaluates to what extent audiovisual fusion is dependent on speakers. Experimental results show that the CER of AVSR is 16.9% lower than those of traditional models (optimal performance scenario), and 11.8% lower than that for lip reading. The accuracy for recognizing phonemes, especially finals, improves substantially. For recognizing initials, the accuracy improves for affricates and fricatives where the lip movements are obvious and deteriorates for stops where the lip movements are not obvious. In AVSR, the generalization onto different speakers is also better than in a single modality and the CER can drop by as much as 17.2%. Therefore, AVSR is of great significance in studying the protection and preservation of endangered languages through AI.

Asunto(s)

Aclimatación , Habla , Acústica , Suministros de Energía Eléctrica , Lenguaje

13.

Developmental change in children's speech processing of auditory and visual cues: An eyetracking study.

Zamuner, Tania S; Rabideau, Theresa; McDonald, Margarethe; Yeung, H Henny.

J Child Lang ; 50(1): 27-51, 2023 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-36503546

RESUMEN

This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.

Asunto(s)

Lectura de los Labios , Percepción del Habla , Adulto , Niño , Humanos , Preescolar , Habla , Desarrollo del Lenguaje , Señales (Psicología)

14.

Lip movements enhance speech representations and effective connectivity in auditory dorsal stream.

Zhang, Lei; Du, Yi.

Neuroimage ; 257: 119311, 2022 08 15.

Artículo en Inglés | MEDLINE | ID: mdl-35589000

RESUMEN

Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.

Asunto(s)

Corteza Auditiva , Percepción del Habla , Estimulación Acústica , Percepción Auditiva , Mapeo Encefálico , Humanos , Labio , Habla

15.

Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker.

Haider, Chandra Leon; Suess, Nina; Hauswald, Anne; Park, Hyojin; Weisz, Nathan.

Neuroimage ; 252: 119044, 2022 05 15.

Artículo en Inglés | MEDLINE | ID: mdl-35240298

RESUMEN

Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.

Asunto(s)

Percepción del Habla , Habla , Estimulación Acústica , Acústica , Humanos , Boca , Percepción Visual

16.

Attention to audiovisual speech does not facilitate language acquisition in infants with familial history of autism.

Chawarska, Katarzyna; Lewkowicz, David; Feiner, Hannah; Macari, Suzanne; Vernetti, Angelina.

J Child Psychol Psychiatry ; 63(12): 1466-1476, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-35244219

RESUMEN

BACKGROUND: Due to familial liability, siblings of children with ASD exhibit elevated risk for language delays. The processes contributing to language delays in this population remain unclear. METHODS: Considering well-established links between attention to dynamic audiovisual cues inherent in a speaker's face and speech processing, we investigated if attention to a speaker's face and mouth differs in 12-month-old infants at high familial risk for ASD but without ASD diagnosis (hr-sib; n = 91) and in infants at low familial risk (lr-sib; n = 62) for ASD and whether attention at 12 months predicts language outcomes at 18 months. RESULTS: At 12 months, hr-sib and lr-sib infants did not differ in attention to face (p = .14), mouth preference (p = .30), or in receptive and expressive language scores (p = .36, p = .33). At 18 months, the hr-sib infants had lower receptive (p = .01) but not expressive (p = .84) language scores than the lr-sib infants. In the lr-sib infants, greater attention to the face (p = .022) and a mouth preference (p = .025) contributed to better language outcomes at 18 months. In the hr-sib infants, neither attention to the face nor a mouth preference was associated with language outcomes at 18 months. CONCLUSIONS: Unlike low-risk infants, high-risk infants do not appear to benefit from audiovisual prosodic and speech cues in the service of language acquisition despite intact attention to these cues. We propose that impaired processing of audiovisual cues may constitute the link between genetic risk factors and poor language outcomes observed across the autism risk spectrum and may represent a promising endophenotype in autism.

Asunto(s)

Trastorno del Espectro Autista , Trastorno Autístico , Trastornos del Desarrollo del Lenguaje , Lactante , Niño , Humanos , Habla , Predisposición Genética a la Enfermedad , Desarrollo del Lenguaje

17.

Noise-Robust Multimodal Audio-Visual Speech Recognition System for Speech-Based Interaction Applications.

Jeon, Sanghun; Kim, Mun Sang.

Sensors (Basel) ; 22(20)2022 Oct 12.

Artículo en Inglés | MEDLINE | ID: mdl-36298089

RESUMEN

Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user-system interaction. However, its application to real environments is limited owing to the various noise disruptions in real environments. In this study, an audio and visual information-based multimode interaction system is proposed that enables virtual aquarium systems that use speech to interact to be robust to ambient noise. For audio-based speech recognition, a list of words recognized by a speech API is expressed as word vectors using a pretrained model. Meanwhile, vision-based speech recognition uses a composite end-to-end deep neural network. Subsequently, the vectors derived from the API and vision are classified after concatenation. The signal-to-noise ratio of the proposed system was determined based on data from four types of noise environments. Furthermore, it was tested for accuracy and efficiency against existing single-mode strategies for extracting visual features and audio speech recognition. Its average recognition rate was 91.42% when only speech was used, and improved by 6.7% to 98.12% when audio and visual information were combined. This method can be helpful in various real-world settings where speech recognition is regularly utilized, such as cafés, museums, music halls, and kiosks.

Asunto(s)

Percepción del Habla , Habla , Software de Reconocimiento del Habla , Ruido , Relación Señal-Ruido

18.

Crossmodal Phase Reset and Evoked Responses Provide Complementary Mechanisms for the Influence of Visual Speech in Auditory Cortex.

Mégevand, Pierre; Mercier, Manuel R; Groppe, David M; Zion Golumbic, Elana; Mesgarani, Nima; Beauchamp, Michael S; Schroeder, Charles E; Mehta, Ashesh D.

J Neurosci ; 40(44): 8530-8542, 2020 10 28.

Artículo en Inglés | MEDLINE | ID: mdl-33023923

RESUMEN

Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs.SIGNIFICANCE STATEMENT Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.

Asunto(s)

Corteza Auditiva/fisiología , Potenciales Evocados/fisiología , Comunicación no Verbal/fisiología , Adulto , Epilepsia Refractaria/cirugía , Electrocorticografía , Potenciales Evocados Auditivos/fisiología , Potenciales Evocados Visuales/fisiología , Femenino , Humanos , Persona de Mediana Edad , Neuronas/fisiología , Comunicación no Verbal/psicología , Estimulación Luminosa , Adulto Joven

19.

Breaking down the cocktail party: Attentional modulation of cerebral audiovisual speech processing.

Wikman, Patrik; Sahari, Elisa; Salmela, Viljami; Leminen, Alina; Leminen, Miika; Laine, Matti; Alho, Kimmo.

Neuroimage ; 224: 117365, 2021 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-32941985

RESUMEN

Recent studies utilizing electrophysiological speech envelope reconstruction have sparked renewed interest in the cocktail party effect by showing that auditory neurons entrain to selectively attended speech. Yet, the neural networks of attention to speech in naturalistic audiovisual settings with multiple sound sources remain poorly understood. We collected functional brain imaging data while participants viewed audiovisual video clips of lifelike dialogues with concurrent distracting speech in the background. Dialogues were presented in a full-factorial design, comprising task (listen to the dialogues vs. ignore them), audiovisual quality and semantic predictability. We used univariate analyses in combination with multivariate pattern analysis (MVPA) to study modulations of brain activity related to attentive processing of audiovisual speech. We found attentive speech processing to cause distinct spatiotemporal modulation profiles in distributed cortical areas including sensory and frontal-control networks. Semantic coherence modulated attention-related activation patterns in the earliest stages of auditory cortical processing, suggesting that the auditory cortex is involved in high-level speech processing. Our results corroborate views that emphasize the dynamic nature of attention, with task-specificity and context as cornerstones of the underlying neuro-cognitive mechanisms.

Asunto(s)

Corteza Auditiva/fisiología , Percepción Auditiva/fisiología , Percepción del Habla/fisiología , Habla/fisiología , Estimulación Acústica/métodos , Adulto , Encéfalo/fisiología , Mapeo Encefálico/métodos , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Percepción Visual/fisiología , Adulto Joven

20.

Event-related potentials evidence for long-term audiovisual representations of phonemes in adults.

Kaganovich, Natalya; Christ, Sharon.

Eur J Neurosci ; 54(11): 7860-7875, 2021 12.

Artículo en Inglés | MEDLINE | ID: mdl-34750895

RESUMEN

The presence of long-term auditory representations for phonemes has been well-established. However, since speech perception is typically audiovisual, we hypothesized that long-term phoneme representations may also contain information on speakers' mouth shape during articulation. We used an audiovisual oddball paradigm in which, on each trial, participants saw a face and heard one of two vowels. One vowel occurred frequently (standard), while another occurred rarely (deviant). In one condition (neutral), the face had a closed, non-articulating mouth. In the other condition (audiovisual violation), the mouth shape matched the frequent vowel. Although in both conditions stimuli were audiovisual, we hypothesized that identical auditory changes would be perceived differently by participants. Namely, in the neutral condition, deviants violated only the audiovisual pattern specific to each block. By contrast, in the audiovisual violation condition, deviants additionally violated long-term representations for how a speaker's mouth looks during articulation. We compared the amplitude of mismatch negativity (MMN) and P3 components elicited by deviants in the two conditions. The MMN extended posteriorly over temporal and occipital sites even though deviants contained no visual changes, suggesting that deviants were perceived as interruptions in audiovisual, rather than auditory only, sequences. As predicted, deviants elicited larger MMN and P3 in the audiovisual violation compared to the neutral condition. The results suggest that long-term representations of phonemes are indeed audiovisual.

Asunto(s)

Potenciales Evocados , Percepción del Habla , Estimulación Acústica , Adulto , Electroencefalografía , Potenciales Evocados Auditivos , Cara , Humanos , Boca

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda