RESUMEN
OBJECTIVES: Speech perception training can be a highly effective intervention to improve perception and language abilities in children who are deaf or hard of hearing. Most studies of speech perception training, however, only measure gains immediately following training. Only a minority of cases include a follow-up assessment after a period without training. A critical unanswered question was whether training-related benefits are retained for a period of time after training has stopped. A primary goal of this investigation was to determine whether children retained training-related benefits 4 to 6 weeks after they completed 16 hours of formal speech perception training. Training was comprised of either auditory or speechreading training, or a combination of both. Also important is to determine if "booster" training can help increase gains made during the initial intensive training period. Another goal of the study was to investigate the benefits of providing home-based booster training during the 4- to 6-week interval after the formal training ceased. The original investigation ( Tye-Murray et al. 2022 ) compared the effects of talker familiarity and the relative benefits of the different types of training. We predicted that the children who received no additional training would retain the gains after the completing the formal training. We also predicted that those children who completed the booster training would realize additional gains. DESIGN: Children, 6 to 12 years old, with hearing loss who had previously participated in the original randomized control study returned 4 to 6 weeks after the conclusion to take a follow-up speech perception assessment. The first group (n = 44) returned after receiving no formal intervention from the research team before the follow-up assessment. A second group of 40 children completed an additional 16 hours of speech perception training at home during a 4- to 6-week interval before the follow-up speech perception assessment. The home-based speech perception training was a continuation of the same training that was received in the laboratory formatted to work on a PC tablet with a portable speaker. The follow-up speech perception assessment included measures of listening and speechreading, with test items spoken by both familiar (trained) and unfamiliar (untrained) talkers. RESULTS: In the group that did not receive the booster training, follow-up testing showed retention for all gains that were obtained immediately following the laboratory-based training. The group that received booster training during the same interval also maintained the benefits from the formal training, with some indication of minor improvement. CONCLUSIONS: Clinically, the present findings are extremely encouraging; the group that did not receive home-based booster training retained the benefits obtained during the laboratory-based training regimen. Moreover, the results suggest that self-paced booster training maintained the relative training gains associated with talker familiarity and training type seen immediately following laboratory-based training. Future aural rehabilitation programs should include maintenance training at home to supplement the speech perception training conducted under more formal conditions at school or in the clinic.
Asunto(s)
Corrección de Deficiencia Auditiva , Sordera , Pérdida Auditiva , Percepción del Habla , Niño , Humanos , Pérdida Auditiva/rehabilitación , Lectura de los Labios , Corrección de Deficiencia Auditiva/métodosRESUMEN
Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.
Asunto(s)
Corteza Auditiva , Percepción del Habla , Corteza Visual , Estimulación Acústica , Corteza Auditiva/fisiología , Percepción Auditiva , Femenino , Humanos , Lectura de los Labios , Masculino , Habla/fisiología , Percepción del Habla/fisiología , Corteza Visual/fisiología , Percepción Visual/fisiologíaRESUMEN
In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.
Asunto(s)
Corteza Auditiva/fisiología , Lenguaje , Lectura de los Labios , Red Nerviosa/fisiología , Percepción del Habla/fisiología , Corteza Visual/fisiología , Percepción Visual/fisiología , Adulto , Anciano , Anciano de 80 o más Años , Corteza Auditiva/diagnóstico por imagen , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Persona de Mediana Edad , Red Nerviosa/diagnóstico por imagen , Corteza Visual/diagnóstico por imagen , Adulto JovenRESUMEN
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Asunto(s)
Corteza Auditiva , Percepción del Habla , Humanos , Lectura de los Labios , Percepción del Habla/fisiología , Encéfalo/fisiología , Corteza Auditiva/fisiología , Fonética , Percepción Visual/fisiologíaRESUMEN
The current accuracy of speech recognition can reach over 97% on different datasets, but in noisy environments, it is greatly reduced. Improving speech recognition performance in noisy environments is a challenging task. Due to the fact that visual information is not affected by noise, researchers often use lip information to help to improve speech recognition performance. This is where the performance of lip recognition and the effect of cross-modal fusion are particularly important. In this paper, we try to improve the accuracy of speech recognition in noisy environments by improving the lip reading performance and the cross-modal fusion effect. First, due to the same lip possibly containing multiple meanings, we constructed a one-to-many mapping relationship model between lips and speech allowing for the lip reading model to consider which articulations are represented from the input lip movements. Audio representations are also preserved by modeling the inter-relationships between paired audiovisual representations. At the inference stage, the preserved audio representations could be extracted from memory by the learned inter-relationships using only video input. Second, a joint cross-fusion model using the attention mechanism could effectively exploit complementary intermodal relationships, and the model calculates cross-attention weights on the basis of the correlations between joint feature representations and individual modalities. Lastly, our proposed model achieved a 4.0% reduction in WER in a -15 dB SNR environment compared to the baseline method, and a 10.1% reduction in WER compared to speech recognition. The experimental results show that our method could achieve a significant improvement over speech recognition models in different noise environments.
Asunto(s)
Lectura de los Labios , Percepción del Habla , Humanos , Habla , Aprendizaje , LabioRESUMEN
This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.
Asunto(s)
Lectura de los Labios , Percepción del Habla , Adulto , Niño , Humanos , Preescolar , Habla , Desarrollo del Lenguaje , Señales (Psicología)RESUMEN
The goal of the current interpretive phenomenological study grounded in Heidegger's philosophies was to explore the experience of lipreaders when society was masked during the coronavirus disease 2019 pandemic. Participants were prelingually deafened English-speaking adults who predominantly relied on lip-reading and speaking for communication. Twelve in-depth email interviews were conducted with respondents recruited via social media. Thematic techniques of Benner were employed, and six themes emerged: Limiting of World Resulting in Negative Emotions, Increased Prominence of Deafness, Balancing Safety and Communication Access, Creative Resourcefulness, Resilience and Personal Growth, and Passage of Time to Bittersweet Freedom. Insights from this study clarify the need for psychosocial support of lipreaders during times of restricted communication access and awareness of accommodations to facilitate inclusion. [Journal of Psychosocial Nursing and Mental Health Services, 61(4), 18-26.].
Asunto(s)
COVID-19 , Lectura de los Labios , Máscaras , Adulto , HumanosRESUMEN
OBJECTIVES: Transfer appropriate processing (TAP) refers to a general finding that training gains are maximized when training and testing are conducted under the same conditions. The present study tested the extent to which TAP applies to speech perception training in children with hearing loss. Specifically, we assessed the benefits of computer-based speech perception training games for enhancing children's speech recognition by comparing three training groups: auditory training (AT), audiovisual training (AVT), and a combination of these two (AT/AVT). We also determined whether talker-specific training, as might occur when children train with the speech of a next year's classroom teacher, leads to better recognition of that talker's speech and if so, the extent to which training benefits generalize to untrained talkers. Consistent with TAP theory, we predicted that children would improve their ability to recognize the speech of the trained talker more than that of three untrained talkers and, depending on their training group, would improve more on an auditory-only (listening) or audiovisual (speechreading) speech perception assessment, that matched the type of training they received. We also hypothesized that benefit would generalize to untrained talkers and to test modalities in which they did not train, albeit to a lesser extent. DESIGN: Ninety-nine elementary school aged children with hearing loss were enrolled into a randomized control trial with a repeated measures A-A-B experimental mixed design in which children served as their own control for the assessment of overall benefit of a particular training type and three different groups of children yielded data for comparing the three types of training. We also assessed talker-specific learning and transfer of learning by including speech perception tests with stimuli spoken by the talker with whom a child trained and stimuli spoken by three talkers with whom the child did not train and by including speech perception tests that presented both auditory (listening) and audiovisual (speechreading) stimuli. Children received 16 hr of gamified training. The games provided word identification and connected speech comprehension training activities. RESULTS: Overall, children showed significant improvement in both their listening and speechreading performance. Consistent with TAP theory, children improved more on their trained talker than on the untrained talkers. Also consistent with TAP theory, the children who received AT improved more on the listening than the speechreading. However, children who received AVT improved on both types of assessment equally, which is not consistent with our predictions derived from a TAP perspective. Age, language level, and phonological awareness were either not predictive of training benefits or only negligibly so. CONCLUSIONS: The findings provide support for the practice of providing children who have hearing loss with structured speech perception training and suggest that future aural rehabilitation programs might include teacher-specific speech perception training to prepare children for an upcoming school year, especially since training will generalize to other talkers. The results also suggest that benefits of speech perception training were not significantly related to age, language level, or degree of phonological awareness. The findings are largely consistent with TAP theory, suggesting that the more aligned a training task is with the desired outcome, the more likely benefit will accrue.
Asunto(s)
Sordera , Pérdida Auditiva , Percepción del Habla , Niño , Computadores , Humanos , Lectura de los Labios , HablaRESUMEN
Lipreading is a technique for analyzing sequences of lip movements and then recognizing the speech content of a speaker. Limited by the structure of our vocal organs, the number of pronunciations we could make is finite, leading to problems with homophones when speaking. On the other hand, different speakers will have various lip movements for the same word. For these problems, we focused on the spatial-temporal feature extraction in word-level lipreading in this paper, and an efficient two-stream model was proposed to learn the relative dynamic information of lip motion. In this model, two different channel capacity CNN streams are used to extract static features in a single frame and dynamic information between multi-frame sequences, respectively. We explored a more effective convolution structure for each component in the front-end model and improved by about 8%. Then, according to the characteristics of the word-level lipreading dataset, we further studied the impact of the two sampling methods on the fast and slow channels. Furthermore, we discussed the influence of the fusion methods of the front-end and back-end models under the two-stream network structure. Finally, we evaluated the proposed model on two large-scale lipreading datasets and achieved a new state-of-the-art.
Asunto(s)
Algoritmos , Lectura de los Labios , Humanos , Aprendizaje , Movimiento (Física) , MovimientoRESUMEN
Concomitant with the recent advances in deep learning, automatic speech recognition and visual speech recognition (VSR) have received considerable attention. However, although VSR systems must identify speech from both frontal and profile faces in real-world scenarios, most VSR studies have focused solely on frontal face pictures. To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°). The encoder uses multiple convolutional neural networks with a spatial attention module to detect minor changes in the mouth patterns of similarly pronounced words, and the decoder uses cascaded local self-attention connectionist temporal classification to collect the details of local contextual information in the immediate vicinity, which results in a substantial performance boost and speedy convergence. To compare the performance of the proposed model for experiments on the OuluVS2 dataset, the dataset was divided into four different perspectives, and the obtained performance improvement was 3.31% (0°), 4.79% (30°), 5.51% (45°), 6.18% (60°), and 4.95% (mean), respectively, compared with the existing state-of-the-art performance, and the average performance improved by 9.1% compared with the baseline. Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications.
Asunto(s)
Lectura de los Labios , Redes Neurales de la Computación , Atención , Humanos , Lenguaje , HablaRESUMEN
BACKGROUND: When reading lips, many people benefit from additional visual information from the lip movements of the speaker, which is, however, very error prone. Algorithms for lip reading with artificial intelligence based on artificial neural networks significantly improve word recognition but are not available for the German language. MATERIALS AND METHODS: A total of 1806 videoclips with only one German-speaking person each were selected, split into word segments, and assigned to word classes using speech-recognition software. In 38,391 video segments with 32 speakers, 18 polysyllabic, visually distinguishable words were used to train and validate a neural network. The 3D Convolutional Neural Network and Gated Recurrent Units models and a combination of both models (GRUConv) were compared, as were different image sections and color spaces of the videos. The accuracy was determined in 5000 training epochs. RESULTS: Comparison of the color spaces did not reveal any relevant different correct classification rates in the range from 69% to 72%. With a cut to the lips, a significantly higher accuracy of 70% was achieved than when cut to the entire speaker's face (34%). With the GRUConv model, the maximum accuracies were 87% with known speakers and 63% in the validation with unknown speakers. CONCLUSION: The neural network for lip reading, which was first developed for the German language, shows a very high level of accuracy, comparable to English-language algorithms. It works with unknown speakers as well and can be generalized with more word classes.
Asunto(s)
Aprendizaje Profundo , Lenguaje , Algoritmos , Inteligencia Artificial , Humanos , Lectura de los LabiosRESUMEN
This study presents three experiments to examine the role of the phonological store component of working memory in the speechreading performance of students with hearing impairment (HI) in China. In Experiment 1, 86 high school students with HI completed an immediate serial recall task with four lists of monosyllabic words that differed in phonological and visual similarities. In Experiment 2 and Experiment 3, 40 participants divided into high or low phonological store capacity (PS) and 40 participants divided into high or low visual phonological story capacity (VPS) completed a speechreading test at the word, phrase and sentence levels. Results revealed that (1) immediate serial recall showed effects of phonological and visual similarity and their interaction; (2) there was no significant effect of phonological store capacities on speechreading; and (3) there was a significant effect of visual phonological store capacities on accuracy but not speed of speechreading. These findings point to a general phonological store system for visual orthographic coding and phonological coding that students with HI engage in speechreading in Chinese. It provides evidence for the contention that the visual-based coding has a more direct impact on speechreading performance of Chinese students with HI than the speech-based coding.
Asunto(s)
Pérdida Auditiva , Percepción del Habla , Humanos , Lingüística , Lectura de los Labios , Fonética , EstudiantesRESUMEN
Lip-reading is crucial for understanding speech in challenging conditions. But how the brain extracts meaning from, silent, visual speech is still under debate. Lip-reading in silence activates the auditory cortices, but it is not known whether such activation reflects immediate synthesis of the corresponding auditory stimulus or imagery of unrelated sounds. To disentangle these possibilities, we used magnetoencephalography to evaluate how cortical activity in 28 healthy adult humans (17 females) entrained to the auditory speech envelope and lip movements (mouth opening) when listening to a spoken story without visual input (audio-only), and when seeing a silent video of a speaker articulating another story (video-only). In video-only, auditory cortical activity entrained to the absent auditory signal at frequencies <1 Hz more than to the seen lip movements. This entrainment process was characterized by an auditory-speech-to-brain delay of â¼70 ms in the left hemisphere, compared with â¼20 ms in audio-only. Entrainment to mouth opening was found in the right angular gyrus at <1 Hz, and in early visual cortices at 1-8 Hz. These findings demonstrate that the brain can use a silent lip-read signal to synthesize a coarse-grained auditory speech representation in early auditory cortices. Our data indicate the following underlying oscillatory mechanism: seeing lip movements first modulates neuronal activity in early visual cortices at frequencies that match articulatory lip movements; the right angular gyrus then extracts slower features of lip movements, mapping them onto the corresponding speech sound features; this information is fed to auditory cortices, most likely facilitating speech parsing.SIGNIFICANCE STATEMENT Lip-reading consists in decoding speech based on visual information derived from observation of a speaker's articulatory facial gestures. Lip-reading is known to improve auditory speech understanding, especially when speech is degraded. Interestingly, lip-reading in silence still activates the auditory cortices, even when participants do not know what the absent auditory signal should be. However, it was uncertain what such activation reflected. Here, using magnetoencephalographic recordings, we demonstrate that it reflects fast synthesis of the auditory stimulus rather than mental imagery of unrelated, speech or non-speech, sounds. Our results also shed light on the oscillatory dynamics underlying lip-reading.
Asunto(s)
Corteza Auditiva/fisiología , Lectura de los Labios , Percepción del Habla/fisiología , Estimulación Acústica , Femenino , Humanos , Magnetoencefalografía , Masculino , Reconocimiento Visual de Modelos/fisiología , Espectrografía del Sonido , Adulto JovenRESUMEN
All it takes is a face-to-face conversation in a noisy environment to realize that viewing a speaker's lip movements contributes to speech comprehension. What are the processes underlying the perception and interpretation of visual speech? Brain areas that control speech production are also recruited during lipreading. This finding raises the possibility that lipreading may be supported, at least to some extent, by a covert unconscious imitation of the observed speech movements in the observer's own speech motor system-a motor simulation. However, whether, and if so to what extent, motor simulation contributes to visual speech interpretation remains unclear. In two experiments, we found that several participants with congenital facial paralysis were as good at lipreading as the control population and performed these tasks in a way that is qualitatively similar to the controls despite severely reduced or even completely absent lip motor representations. Although it remains an open question whether this conclusion generalizes to other experimental conditions and to typically developed participants, these findings considerably narrow the space of hypothesis for a role of motor simulation in lipreading. Beyond its theoretical significance in the field of speech perception, this finding also calls for a re-examination of the more general hypothesis that motor simulation underlies action perception and interpretation developed in the frameworks of motor simulation and mirror neuron hypotheses.
Asunto(s)
Lectura de los Labios , Percepción del Habla , Mapeo Encefálico , Comprensión , Humanos , HablaRESUMEN
All writing systems represent units of spoken language. Studies on the neural correlates of reading in different languages show that this skill relies on access to brain areas dedicated to speech processing. Speech-reading convergence onto a common perisylvian network is therefore considered universal among different writing systems. Using fMRI, we test whether this holds true also for tactile Braille reading in the blind. The neural networks for Braille and visual reading overlapped in the left ventral occipitotemporal (vOT) cortex. Even though we showed similar perisylvian specialization for speech in both groups, blind subjects did not engage this speech system for reading. In contrast to the sighted, speech-reading convergence in the blind was absent in the perisylvian network. Instead, the blind engaged vOT not only in reading but also in speech processing. The involvement of the vOT in speech processing and its engagement in reading in the blind suggests that vOT is included in a modality independent language network in the blind, also evidenced by functional connectivity results. The analysis of individual speech-reading convergence suggests that there may be segregated neuronal populations in the vOT for speech processing and reading in the blind.
Asunto(s)
Ceguera/fisiopatología , Lectura de los Labios , Red Nerviosa/fisiología , Lóbulo Occipital/fisiología , Lectura , Lóbulo Temporal/fisiología , Tacto/fisiología , Estimulación Acústica/métodos , Adolescente , Adulto , Ceguera/diagnóstico por imagen , Equipos de Comunicación para Personas con Discapacidad , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Persona de Mediana Edad , Red Nerviosa/diagnóstico por imagen , Plasticidad Neuronal/fisiología , Lóbulo Occipital/diagnóstico por imagen , Estimulación Luminosa/métodos , Lóbulo Temporal/diagnóstico por imagen , Adulto JovenRESUMEN
Visual information conveyed by a speaking face aids speech perception. In addition, children's ability to comprehend visual-only speech (speechreading ability) is related to phonological awareness and reading skills in both deaf and hearing children. We tested whether training speechreading would improve speechreading, phoneme blending, and reading ability in hearing children. Ninety-two hearing 4- to 5-year-old children were randomised into two groups: business-as-usual controls, and an intervention group, who completed three weeks of computerised speechreading training. The intervention group showed greater improvements in speechreading than the control group at post-test both immediately after training and 3 months later. This was the case for both trained and untrained words. There were no group effects on the phonological awareness or single-word reading tasks, although those with the lowest phoneme blending scores did show greater improvements in blending as a result of training. The improvement in speechreading in hearing children following brief training is encouraging. The results are also important in suggesting a hypothesis for future investigation: that a focus on visual speech information may contribute to phonological skills, not only in deaf children but also in hearing children who are at risk of reading difficulties. A video abstract of this article can be viewed at https://www.youtube.com/watch?v=bBdpliGkbkY.
Asunto(s)
Sordera , Lectura de los Labios , Preescolar , Audición , Humanos , Fonética , LecturaRESUMEN
INTRODUCTION: Patients with postlingual deafness usually depend on visual information for communication, and their lipreading ability could influence cochlear implantation (CI) outcomes. However, it is unclear whether preoperative visual dependency in postlingual deafness positively or negatively affects auditory rehabilitation after CI. Herein, we investigated the influence of preoperative audiovisual per-ception on CI outcomes. METHOD: In this retrospective case-comparison study, 118 patients with postlingual deafness who underwent unilateral CI were enrolled. Evaluation of speech perception was performed under both audiovisual (AV) and audio-only (AO) conditions before and after CI. Before CI, the speech perception test was performed under hearing aid (HA)-assisted conditions. After CI, the speech perception test was performed under the CI-only condition. Only patients with a 10% or less preoperative AO speech perception score were included. RESULTS: Multivariable regression analysis showed that age, gender, residual hearing, operation side, education level, and HA usage were not correlated with either postoperative AV (pAV) or AO (pAO) speech perception. However, duration of deafness showed a significant negative correlation with both pAO (p = 0.003) and pAV (p = 0.015) speech perceptions. Notably, the preoperative AV speech perception score was not correlated with pAO speech perception (R2 = 0.00134, p = 0.693) but was positively associated with pAV speech perception (R2 = 0.0731, p = 0.003). CONCLUSION: Preoperative dependency on audiovisual information may positively influence pAV speech perception in patients with postlingual deafness.
Asunto(s)
Implantación Coclear , Implantes Cocleares , Sordera/cirugía , Audición/fisiología , Percepción del Habla/fisiología , Adulto , Anciano , Estudios de Casos y Controles , Sordera/fisiopatología , Femenino , Pruebas Auditivas , Humanos , Lectura de los Labios , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , TerapéuticaAsunto(s)
Infecciones por Coronavirus , Sordera/fisiopatología , Docentes , Lectura de los Labios , Máscaras , Pandemias , Neumonía Viral , Estudiantes , Enseñanza , COVID-19 , Infecciones por Coronavirus/epidemiología , Retroalimentación Formativa , Humanos , Neumonía Viral/epidemiología , UniversidadesRESUMEN
When comprehending speech-in-noise (SiN), younger and older adults benefit from seeing the speaker's mouth, i.e. visible speech. Younger adults additionally benefit from manual iconic co-speech gestures. Here, we investigate to what extent younger and older adults benefit from perceiving both visual articulators while comprehending SiN, and whether this is modulated by working memory and inhibitory control. Twenty-eight younger and 28 older adults performed a word recognition task in three visual contexts: mouth blurred (speech-only), visible speech, or visible speech + iconic gesture. The speech signal was either clear or embedded in multitalker babble. Additionally, there were two visual-only conditions (visible speech, visible speech + gesture). Accuracy levels for both age groups were higher when both visual articulators were present compared to either one or none. However, older adults received a significantly smaller benefit than younger adults, although they performed equally well in speech-only and visual-only word recognition. Individual differences in verbal working memory and inhibitory control partly accounted for age-related performance differences. To conclude, perceiving iconic gestures in addition to visible speech improves younger and older adults' comprehension of SiN. Yet, the ability to benefit from this additional visual information is modulated by age and verbal working memory. Future research will have to show whether these findings extend beyond the single word level.
Asunto(s)
Envejecimiento/psicología , Lectura de los Labios , Memoria a Corto Plazo , Comunicación no Verbal/psicología , Lengua de Signos , Percepción del Habla , Factores de Edad , Anciano , Comprensión , Gestos , Humanos , Ruido , Detección de Señal Psicológica , Percepción Visual , Adulto JovenRESUMEN
The ability to see a talker's face improves speech intelligibility in noise, provided that the auditory and visual speech signals are approximately aligned in time. However, the importance of spatial alignment between corresponding faces and voices remains unresolved, particularly in multi-talker environments. In a series of online experiments, we investigated this using a task that required participants to selectively attend a target talker in noise while ignoring a distractor talker. In experiment 1, we found improved task performance when the talkers' faces were visible, but only when corresponding faces and voices were presented in the same hemifield (spatially aligned). In experiment 2, we tested for possible influences of eye position on this result. In auditory-only conditions, directing gaze toward the distractor voice reduced performance, but this effect could not fully explain the cost of audio-visual (AV) spatial misalignment. Lowering the signal-to-noise ratio (SNR) of the speech from +4 to -4 dB increased the magnitude of the AV spatial alignment effect (experiment 3), but accurate closed-set lipreading caused a floor effect that influenced results at lower SNRs (experiment 4). Taken together, these results demonstrate that spatial alignment between faces and voices contributes to the ability to selectively attend AV speech.