Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.010
Filter
1.
Sci Adv ; 10(26): eado9576, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38924408

ABSTRACT

Lip language recognition urgently needs wearable and easy-to-use interfaces for interference-free and high-fidelity lip-reading acquisition and to develop accompanying data-efficient decoder-modeling methods. Existing solutions suffer from unreliable lip reading, are data hungry, and exhibit poor generalization. Here, we propose a wearable lip language decoding technology that enables interference-free and high-fidelity acquisition of lip movements and data-efficient recognition of fluent lip language based on wearable motion capture and continuous lip speech movement reconstruction. The method allows us to artificially generate any wanted continuous speech datasets from a very limited corpus of word samples from users. By using these artificial datasets to train the decoder, we achieve an average accuracy of 92.0% across individuals (n = 7) for actual continuous and fluent lip speech recognition for 93 English sentences, even observing no training burn on users because all training datasets are artificially generated. Our method greatly minimizes users' training/learning load and presents a data-efficient and easy-to-use paradigm for lip language recognition.


Subject(s)
Speech , Wearable Electronic Devices , Humans , Language , Lip/physiology , Movement , Male , Female , Adult , Lipreading , Motion Capture
2.
Multisens Res ; 37(3): 243-259, 2024 May 23.
Article in English | MEDLINE | ID: mdl-38777333

ABSTRACT

Auditory speech can be difficult to understand but seeing the articulatory movements of a speaker can drastically improve spoken-word recognition and, on the longer-term, it helps listeners to adapt to acoustically distorted speech. Given that individuals with developmental dyslexia (DD) have sometimes been reported to rely less on lip-read speech than typical readers, we examined lip-read-driven adaptation to distorted speech in a group of adults with DD ( N = 29) and a comparison group of typical readers ( N = 29). Participants were presented with acoustically distorted Dutch words (six-channel noise-vocoded speech, NVS) in audiovisual training blocks (where the speaker could be seen) interspersed with audio-only test blocks. Results showed that words were more accurately recognized if the speaker could be seen (a lip-read advantage), and that performance steadily improved across subsequent auditory-only test blocks (adaptation). There were no group differences, suggesting that perceptual adaptation to disrupted spoken words is comparable for dyslexic and typical readers. These data open up a research avenue to investigate the degree to which lip-read-driven speech adaptation generalizes across different types of auditory degradation, and across dyslexic readers with decoding versus comprehension difficulties.


Subject(s)
Dyslexia , Lipreading , Reading , Speech Perception , Humans , Speech Perception/physiology , Male , Female , Dyslexia/physiopathology , Adult , Young Adult , Adaptation, Physiological/physiology , Noise , Acoustic Stimulation
3.
Hear Res ; 447: 109023, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38733710

ABSTRACT

Limited auditory input, whether caused by hearing loss or by electrical stimulation through a cochlear implant (CI), can be compensated by the remaining senses. Specifically for CI users, previous studies reported not only improved visual skills, but also altered cortical processing of unisensory visual and auditory stimuli. However, in multisensory scenarios, it is still unclear how auditory deprivation (before implantation) and electrical hearing experience (after implantation) affect cortical audiovisual speech processing. Here, we present a prospective longitudinal electroencephalography (EEG) study which systematically examined the deprivation- and CI-induced alterations of cortical processing of audiovisual words by comparing event-related potentials (ERPs) in postlingually deafened CI users before and after implantation (five weeks and six months of CI use). A group of matched normal-hearing (NH) listeners served as controls. The participants performed a word-identification task with congruent and incongruent audiovisual words, focusing their attention on either the visual (lip movement) or the auditory speech signal. This allowed us to study the (top-down) attention effect on the (bottom-up) sensory cortical processing of audiovisual speech. When compared to the NH listeners, the CI candidates (before implantation) and the CI users (after implantation) exhibited enhanced lipreading abilities and an altered cortical response at the N1 latency range (90-150 ms) that was characterized by a decreased theta oscillation power (4-8 Hz) and a smaller amplitude in the auditory cortex. After implantation, however, the auditory-cortex response gradually increased and developed a stronger intra-modal connectivity. Nevertheless, task efficiency and activation in the visual cortex was significantly modulated in both groups by focusing attention on the visual as compared to the auditory speech signal, with the NH listeners additionally showing an attention-dependent decrease in beta oscillation power (13-30 Hz). In sum, these results suggest remarkable deprivation effects on audiovisual speech processing in the auditory cortex, which partially reverse after implantation. Although even experienced CI users still show distinct audiovisual speech processing compared to NH listeners, pronounced effects of (top-down) direction of attention on (bottom-up) audiovisual processing can be observed in both groups. However, NH listeners but not CI users appear to show enhanced allocation of cognitive resources in visually as compared to auditory attended audiovisual speech conditions, which supports our behavioural observations of poorer lipreading abilities and reduced visual influence on audition in NH listeners as compared to CI users.


Subject(s)
Acoustic Stimulation , Attention , Cochlear Implantation , Cochlear Implants , Deafness , Electroencephalography , Persons With Hearing Impairments , Photic Stimulation , Speech Perception , Humans , Male , Female , Middle Aged , Cochlear Implantation/instrumentation , Adult , Prospective Studies , Longitudinal Studies , Persons With Hearing Impairments/psychology , Persons With Hearing Impairments/rehabilitation , Deafness/physiopathology , Deafness/rehabilitation , Deafness/psychology , Case-Control Studies , Aged , Visual Perception , Lipreading , Time Factors , Hearing , Evoked Potentials, Auditory , Auditory Cortex/physiopathology , Evoked Potentials
4.
Lang Speech Hear Serv Sch ; 55(3): 756-766, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38630019

ABSTRACT

PURPOSE: The purpose of the present study was to investigate the relationship between speechreading ability, phonological skills, and word reading ability in typically developing children. METHOD: Sixty-six typically developing children (6-7 years old) completed tasks measuring word reading, speechreading (words, sentences, and short stories), alliteration awareness, rhyme awareness, nonword reading, and rapid automatized naming (RAN). RESULTS: Speechreading ability was significantly correlated with rhyme and alliteration awareness, phonological error rate, nonword reading, and reading ability (medium effect sizes) and RAN (small effect size). Multiple regression analyses showed that speechreading was not a unique predictor of word reading ability beyond the contribution of phonological skills. A speechreading error analysis revealed that children tended to use a phonological strategy when speechreading, and in particular, this strategy was used by skilled speechreaders. CONCLUSIONS: The current study provides converging evidence that speechreading and phonological skills are positively related in typically developing children. These skills are likely to have a reciprocal relationship, and children may benefit from having their attention drawn to visual information available on the lips while learning letter sounds or learning to read, as this could augment and strengthen underlying phonological representations.


Subject(s)
Lipreading , Phonetics , Reading , Humans , Child , Male , Female
5.
PLoS One ; 19(3): e0300926, 2024.
Article in English | MEDLINE | ID: mdl-38551907

ABSTRACT

To examine visual speech perception (i.e., lip-reading), we created a multi-layer network (the AV-net) that contained: (1) an auditory layer with nodes representing phonological word-forms and edges connecting words that were phonologically related, and (2) a visual layer with nodes representing the viseme representations of words and edges connecting viseme representations that differed by a single viseme (and additional edges to connect related nodes in the two layers). The results of several computer simulations (in which activation diffused across the network to simulate word identification) are reported and compared to the performance of human participants who identified the same words in a condition in which audio and visual information were both presented (Simulation 1), in an audio-only presentation condition (Simulation 2), and a visual-only presentation condition (Simulation 3). Another simulation (Simulation 4) examined the influence of phonological information on visual speech perception by comparing performance in the multi-layer AV-net to a single-layer network that contained only a visual layer with nodes representing the viseme representations of words and edges connecting viseme representations that differed by a single viseme. We also report the results of several analyses of the errors made by human participants in the visual-only presentation condition. The results of our analyses have implications for future research and training of lip-reading, and for the development of automatic lip-reading devices and software for individuals with certain developmental or acquired disorders or for listeners with normal hearing in noisy conditions.


Subject(s)
Speech Perception , Humans , Speech Perception/physiology , Visual Perception/physiology , Lipreading , Speech , Linguistics
6.
Ear Hear ; 45(1): 164-173, 2024.
Article in English | MEDLINE | ID: mdl-37491715

ABSTRACT

OBJECTIVES: Speech perception training can be a highly effective intervention to improve perception and language abilities in children who are deaf or hard of hearing. Most studies of speech perception training, however, only measure gains immediately following training. Only a minority of cases include a follow-up assessment after a period without training. A critical unanswered question was whether training-related benefits are retained for a period of time after training has stopped. A primary goal of this investigation was to determine whether children retained training-related benefits 4 to 6 weeks after they completed 16 hours of formal speech perception training. Training was comprised of either auditory or speechreading training, or a combination of both. Also important is to determine if "booster" training can help increase gains made during the initial intensive training period. Another goal of the study was to investigate the benefits of providing home-based booster training during the 4- to 6-week interval after the formal training ceased. The original investigation ( Tye-Murray et al. 2022 ) compared the effects of talker familiarity and the relative benefits of the different types of training. We predicted that the children who received no additional training would retain the gains after the completing the formal training. We also predicted that those children who completed the booster training would realize additional gains. DESIGN: Children, 6 to 12 years old, with hearing loss who had previously participated in the original randomized control study returned 4 to 6 weeks after the conclusion to take a follow-up speech perception assessment. The first group (n = 44) returned after receiving no formal intervention from the research team before the follow-up assessment. A second group of 40 children completed an additional 16 hours of speech perception training at home during a 4- to 6-week interval before the follow-up speech perception assessment. The home-based speech perception training was a continuation of the same training that was received in the laboratory formatted to work on a PC tablet with a portable speaker. The follow-up speech perception assessment included measures of listening and speechreading, with test items spoken by both familiar (trained) and unfamiliar (untrained) talkers. RESULTS: In the group that did not receive the booster training, follow-up testing showed retention for all gains that were obtained immediately following the laboratory-based training. The group that received booster training during the same interval also maintained the benefits from the formal training, with some indication of minor improvement. CONCLUSIONS: Clinically, the present findings are extremely encouraging; the group that did not receive home-based booster training retained the benefits obtained during the laboratory-based training regimen. Moreover, the results suggest that self-paced booster training maintained the relative training gains associated with talker familiarity and training type seen immediately following laboratory-based training. Future aural rehabilitation programs should include maintenance training at home to supplement the speech perception training conducted under more formal conditions at school or in the clinic.


Subject(s)
Correction of Hearing Impairment , Deafness , Hearing Loss , Speech Perception , Child , Humans , Hearing Loss/rehabilitation , Lipreading , Correction of Hearing Impairment/methods
7.
J Speech Lang Hear Res ; 66(12): 5109-5128, 2023 12 11.
Article in English | MEDLINE | ID: mdl-37934877

ABSTRACT

PURPOSE: The COVID-19 pandemic led to the implementation of preventive measures that exacerbated communication difficulties for individuals with hearing loss. This study aims to explore the perception of adults with hearing loss about the communication difficulties caused by the preventive measures and about their experiences with communication 1 year after the adoption of these preventive measures. METHOD: Individual semistructured interviews were conducted via videoconference with six adults who have hearing loss from the province of Québec, Canada. Data were examined using qualitative content analysis. RESULTS: The study found that face masks and in-person work (i.e., in opposition to remote work) were important barriers to communication because of hindered lipreading and competing noise in many workplaces. In contrast, preventive measures that allowed visual information transmission (e.g., transparent face masks, fixed plastic partitions) were considered favorable for communication. Communication partners were perceived as playing an important role in communication success with preventive measures: Familiar communication partners improved communication, whereas those with poor attitude or strategies hindered communication. Participants found that videoconferences could provide satisfactory communication but were sometimes hindered by issues such as bad audiovisual quality or too many participants. CONCLUSIONS: This study identified reduced access to speech reading and lack of general awareness about hearing issues as key barriers to communication during the pandemic. The decreased communication capabilities were perceived to be most problematic at work and during health appointments, and tended to cause frustration, anxiety, self-esteem issues, and social isolation. Suggestions are outlined for current and future public health measures to better consider the experience of people with hearing loss.


Subject(s)
COVID-19 , Deafness , Hearing Loss , Adult , Humans , Pandemics , COVID-19/prevention & control , Lipreading
8.
Neuroimage ; 282: 120391, 2023 11 15.
Article in English | MEDLINE | ID: mdl-37757989

ABSTRACT

There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.


Subject(s)
Auditory Cortex , Speech Perception , Humans , Lipreading , Speech Perception/physiology , Brain/physiology , Auditory Cortex/physiology , Phonetics , Visual Perception/physiology
9.
Sensors (Basel) ; 23(4)2023 Feb 11.
Article in English | MEDLINE | ID: mdl-36850648

ABSTRACT

The current accuracy of speech recognition can reach over 97% on different datasets, but in noisy environments, it is greatly reduced. Improving speech recognition performance in noisy environments is a challenging task. Due to the fact that visual information is not affected by noise, researchers often use lip information to help to improve speech recognition performance. This is where the performance of lip recognition and the effect of cross-modal fusion are particularly important. In this paper, we try to improve the accuracy of speech recognition in noisy environments by improving the lip reading performance and the cross-modal fusion effect. First, due to the same lip possibly containing multiple meanings, we constructed a one-to-many mapping relationship model between lips and speech allowing for the lip reading model to consider which articulations are represented from the input lip movements. Audio representations are also preserved by modeling the inter-relationships between paired audiovisual representations. At the inference stage, the preserved audio representations could be extracted from memory by the learned inter-relationships using only video input. Second, a joint cross-fusion model using the attention mechanism could effectively exploit complementary intermodal relationships, and the model calculates cross-attention weights on the basis of the correlations between joint feature representations and individual modalities. Lastly, our proposed model achieved a 4.0% reduction in WER in a -15 dB SNR environment compared to the baseline method, and a 10.1% reduction in WER compared to speech recognition. The experimental results show that our method could achieve a significant improvement over speech recognition models in different noise environments.


Subject(s)
Lipreading , Speech Perception , Humans , Speech , Learning , Lip
10.
Sci Rep ; 13(1): 928, 2023 01 17.
Article in English | MEDLINE | ID: mdl-36650188

ABSTRACT

In this work, we propose a framework to enhance the communication abilities of speech-impaired patients in an intensive care setting via reading lips. Medical procedure, such as a tracheotomy, causes the patient to lose the ability to utter speech with little to no impact on the habitual lip movement. Consequently, we developed a framework to predict the silently spoken text by performing visual speech recognition, i.e., lip-reading. In a two-stage architecture, frames of the patient's face are used to infer audio features as an intermediate prediction target, which are then used to predict the uttered text. To the best of our knowledge, this is the first approach to bring visual speech recognition into an intensive care setting. For this purpose, we recorded an audio-visual dataset in the University Hospital of Aachen's intensive care unit (ICU) with a language corpus hand-picked by experienced clinicians to be representative of their day-to-day routine. With a word error rate of 6.3%, the trained system reaches a sufficient overall performance to significantly increase the quality of communication between patient and clinician or relatives.


Subject(s)
Speech Perception , Humans , Speech , Lipreading , Language , Critical Care
11.
Brain Behav ; 13(2): e2869, 2023 02.
Article in English | MEDLINE | ID: mdl-36579557

ABSTRACT

INTRODUCTION: Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS: We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS: Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS: Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.


Subject(s)
Lipreading , Speech Perception , Humans , Female , Brain , Auditory Perception , Cognition , Magnetic Resonance Imaging
12.
J Psychosoc Nurs Ment Health Serv ; 61(4): 18-26, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36198121

ABSTRACT

The goal of the current interpretive phenomenological study grounded in Heidegger's philosophies was to explore the experience of lipreaders when society was masked during the coronavirus disease 2019 pandemic. Participants were prelingually deafened English-speaking adults who predominantly relied on lip-reading and speaking for communication. Twelve in-depth email interviews were conducted with respondents recruited via social media. Thematic techniques of Benner were employed, and six themes emerged: Limiting of World Resulting in Negative Emotions, Increased Prominence of Deafness, Balancing Safety and Communication Access, Creative Resourcefulness, Resilience and Personal Growth, and Passage of Time to Bittersweet Freedom. Insights from this study clarify the need for psychosocial support of lipreaders during times of restricted communication access and awareness of accommodations to facilitate inclusion. [Journal of Psychosocial Nursing and Mental Health Services, 61(4), 18-26.].


Subject(s)
COVID-19 , Lipreading , Masks , Adult , Humans
13.
J Child Lang ; 50(1): 27-51, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36503546

ABSTRACT

This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.


Subject(s)
Lipreading , Speech Perception , Adult , Child , Humans , Child, Preschool , Speech , Language Development , Cues
14.
Am Ann Deaf ; 167(3): 303-312, 2022.
Article in English | MEDLINE | ID: mdl-36314163

ABSTRACT

Perceptual restoration occurs when the brain restores missing segments from speech under certain conditions. It is investigated in the auditory modality, but minimal evidence has been collected during speechreading tasks. The authors measured perceptual restoration in speechreading by individuals with hearing loss and compared it to perceptual restoration in auditory speech by normally hearing individuals. Visual perceptual restoration for speechreading was measured in 33 individuals with profound hearing loss by blurring the keywords in silent video recordings of a speaker uttering a sentence. Auditory perceptual restoration was measured in 33 normally hearing individuals by distorting the keywords in spoken sentences. It was found that the amount of restoration was similar for speechreading through the visual modality by individuals with hearing loss and speech perception through the auditory modality by normally hearing individuals. These findings may facilitate understanding of speech processing by individuals with hearing loss.


Subject(s)
Deafness , Hearing Loss , Speech Perception , Adult , Humans , Lipreading , Hearing
15.
PLoS One ; 17(9): e0275585, 2022.
Article in English | MEDLINE | ID: mdl-36178907

ABSTRACT

Visual input is crucial for understanding speech under noisy conditions, but there are hardly any tools to assess the individual ability to lipread. With this study, we wanted to (1) investigate how linguistic characteristics of language on the one hand and hearing impairment on the other hand have an impact on lipreading abilities and (2) provide a tool to assess lipreading abilities for German speakers. 170 participants (22 prelingually deaf) completed the online assessment, which consisted of a subjective hearing impairment scale and silent videos in which different item categories (numbers, words, and sentences) were spoken. The task for our participants was to recognize the spoken stimuli just by visual inspection. We used different versions of one test and investigated the impact of item categories, word frequency in the spoken language, articulation, sentence frequency in the spoken language, sentence length, and differences between speakers on the recognition score. We found an effect of item categories, articulation, sentence frequency, and sentence length on the recognition score. With respect to hearing impairment we found that higher subjective hearing impairment is associated with higher test score. We did not find any evidence that prelingually deaf individuals show enhanced lipreading skills over people with postlingual acquired hearing impairment. However, we see an interaction with education only in the prelingual deaf, but not in the population with postlingual acquired hearing loss. This points to the fact that there are different factors contributing to enhanced lipreading abilities depending on the onset of hearing impairment (prelingual vs. postlingual). Overall, lipreading skills vary strongly in the general population independent of hearing impairment. Based on our findings we constructed a new and efficient lipreading assessment tool (SaLT) that can be used to test behavioral lipreading abilities in the German speaking population.


Subject(s)
Deafness , Hearing Loss , Speech Perception , Humans , Language , Linguistics , Lipreading , Speech , Visual Perception
16.
Nat Commun ; 13(1): 5168, 2022 09 07.
Article in English | MEDLINE | ID: mdl-36071056

ABSTRACT

The problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the target. However, these technologies have well-known limitations of occlusion and ambient lighting with serious privacy concerns. Furthermore, vision-based technologies are not useful for multi-modal hearing aids in the coronavirus (COVID-19) environment, where face masks have become a norm. This paper aims to solve the fundamental limitations of camera-based systems by proposing a radio frequency (RF) based Lip-reading framework, having an ability to read lips under face masks. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train machine learning (ML) and deep learning (DL) models. A high classification accuracy of 95% is achieved on the Wi-Fi data utilising neural network (NN) models. Moreover, similar accuracy is achieved by VGG16 deep learning model on the collected radar-based dataset.


Subject(s)
COVID-19 , Masks , COVID-19/prevention & control , Humans , Lipreading , Neural Networks, Computer , Personal Protective Equipment
17.
J Neurosci ; 42(31): 6108-6120, 2022 08 03.
Article in English | MEDLINE | ID: mdl-35760528

ABSTRACT

Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.


Subject(s)
Auditory Cortex , Speech Perception , Visual Cortex , Acoustic Stimulation , Auditory Cortex/physiology , Auditory Perception , Female , Humans , Lipreading , Male , Speech/physiology , Speech Perception/physiology , Visual Cortex/physiology , Visual Perception/physiology
18.
eNeuro ; 9(3)2022.
Article in English | MEDLINE | ID: mdl-35728955

ABSTRACT

Speech is an intrinsically multisensory signal, and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker's lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.


Subject(s)
Lipreading , Speech Perception , Acoustic Stimulation , Acoustics , Humans , Speech
19.
Sensors (Basel) ; 22(9)2022 May 09.
Article in English | MEDLINE | ID: mdl-35591284

ABSTRACT

Concomitant with the recent advances in deep learning, automatic speech recognition and visual speech recognition (VSR) have received considerable attention. However, although VSR systems must identify speech from both frontal and profile faces in real-world scenarios, most VSR studies have focused solely on frontal face pictures. To address this issue, we propose an end-to-end sentence-level multi-view VSR architecture for faces captured from four different perspectives (frontal, 30°, 45°, and 60°). The encoder uses multiple convolutional neural networks with a spatial attention module to detect minor changes in the mouth patterns of similarly pronounced words, and the decoder uses cascaded local self-attention connectionist temporal classification to collect the details of local contextual information in the immediate vicinity, which results in a substantial performance boost and speedy convergence. To compare the performance of the proposed model for experiments on the OuluVS2 dataset, the dataset was divided into four different perspectives, and the obtained performance improvement was 3.31% (0°), 4.79% (30°), 5.51% (45°), 6.18% (60°), and 4.95% (mean), respectively, compared with the existing state-of-the-art performance, and the average performance improved by 9.1% compared with the baseline. Thus, the suggested design enhances the performance of multi-view VSR and boosts its usefulness in real-world applications.


Subject(s)
Lipreading , Neural Networks, Computer , Attention , Humans , Language , Speech
20.
Sensors (Basel) ; 22(10)2022 May 13.
Article in English | MEDLINE | ID: mdl-35632141

ABSTRACT

Lipreading is a technique for analyzing sequences of lip movements and then recognizing the speech content of a speaker. Limited by the structure of our vocal organs, the number of pronunciations we could make is finite, leading to problems with homophones when speaking. On the other hand, different speakers will have various lip movements for the same word. For these problems, we focused on the spatial-temporal feature extraction in word-level lipreading in this paper, and an efficient two-stream model was proposed to learn the relative dynamic information of lip motion. In this model, two different channel capacity CNN streams are used to extract static features in a single frame and dynamic information between multi-frame sequences, respectively. We explored a more effective convolution structure for each component in the front-end model and improved by about 8%. Then, according to the characteristics of the word-level lipreading dataset, we further studied the impact of the two sampling methods on the fast and slow channels. Furthermore, we discussed the influence of the fusion methods of the front-end and back-end models under the two-stream network structure. Finally, we evaluated the proposed model on two large-scale lipreading datasets and achieved a new state-of-the-art.


Subject(s)
Algorithms , Lipreading , Humans , Learning , Motion , Movement
SELECTION OF CITATIONS
SEARCH DETAIL